1. Problem Statement
Consider a firstorder dynamical system in dimensions of the form
(1.1) 
and measurement data given at timepoints by
where throughout we use the bracket notation . The matrix represents i.i.d. measurement noise. The focus of this article is the reconstruction of the dynamics (1.1) from the measurements .
The SINDy algorithm (Sparse Identification of Nonlinear Dynamics [4]) has been shown to be successful in solving this problem for sparsely represented nonlinear dynamics when noise is small and dynamic scales do not vary across multiple orders of magnitude. This framework assumes that the function in (1.1) is given componentwise by
(1.2) 
for some known family of functions and a sparse weight matrix . The problem is then transformed into solving for by building a data matrix given by
so that the candidate functions are directly evaluated at the noisy data. Solving (1.1) for then reduces to solving
(1.3) 
for a sparse weight matrix , where
is the numerical time derivative of the data . Sequentiallythresholded
least squares is then used to arrive at a sparse solution.
The automatic creation of an accurate mathematical model from data is a challenging task and research into statistically rigorous model selection can be traced back to Akaike’s seminal work in the 1970’s [1, 2]. In the last 20 years, there has been substantial work in this area at the interface between applied mathematics and statistics (see [3, 8, 9, 13, 15, 16] for both theory and applications). More recently, the formulation of system discovery problems in terms of a candidate basis of nonlinear functions (1.2) and subsequent discretization (1.3) was introduced in [14] in the context of catastrophe prediction. The authors of [14] used compressed sensing techniques to enforce sparsity. Since then there has been an explosion of interest in the problem of identifying nonlinear dynamical systems from data, with some of the primary techniques being Gaussian process regression [10]
, deep neural networks
[11][18, 19] and a variety of methods from numerical analysis [6, 7]. These techniques have been successfully applied to discovery of both ordrinary and partial differential equations. The variety of approaches qualitatitively differ in the interpretability of the resulting datadriven dynamical system, the practicality of the algorithm, and the robustness due to noise, scale separation, etc. For instance, a neuralnetwork based datadriven dynamical system does not easily lend itself to physical interpretation. As well, certain sparsification techniques are not practical to the general scientific community, where the problem of system identification from data is ubiquitous. The SINDy algorithm allows for direct interpretations of the dynamics from identified differential equations and uses sequentially thresholded leastsquares to enforce sparsity, which is not nearly as robust as other approaches but is easy to implement and has been proven to converge to sparse local minimizers in
[17]. Therefore, for simplicity we use sequential thresholding in this article to demonstrate the viability of our proposed weak formulation. Naturally one could investigate using a more robust sparsification strategy.The aim of the present article is to provide rigorous justification for using the weak formulation of the dynamics in place of local pointwise derative approximations, as well as a robust algorithm for doing so. As such, we restrict numerical experiments to autonomous ordinary differential equations for their immenability to analysis. Natural next steps are to explore identification of PDEs and nonautonomous dynamical systems. We note that the use of integral equations for system identification was introduced in
[12], where compressed sensing techniques were used to enforce sparsity, and that this technique can be seen as a special case of the method introduced here. In Section 2 we introduce the algorithm with analysis of the resulting error structure and in Section 3 we provide numerical experimentation with a range of nonlinear systems. In Section 4, we provide concluding remarks including a brief comparison between WSINDy and conventional SINDy as well as natural next directions for this line of research.2. Weak SINDy
We approach the problem of system identification (1.3) from a nonstandard perspective by utilizing the weak form of the differential equation. Recall that for any smooth test function (absolutely continuous is sufficient) and interval , equation (1.1) admits the weak formulation
(2.1) 
With , we arrive at the integral equation of the dynamics explored in [12]. If we instead take to be nonconstant and compactly supported in , we arrive at
(2.2) 
We then define the generalized residual for a given test function by replacing with a candidate element from the span of and with as follows:
(2.3) 
Clearly with and we have for all compactlysupported in ; however, is a discrete set of data, hence (2.3) can at best be approximated numerically, with measurement noise presenting a significant barrier to accurate indentification of .
2.1. Method Overview
For analogy with traditional Galerkin methods, consider the forward problem of solving a dynamical system such as (1.1) for . The Galerkin approach is to seek a solution represented in a chosen trial basis such that the residual , defined by
is minimized over all test functions living in the span of
a given test function basis . If the trial
and test function bases are known analytically, inner products of
the form appearing in the residual
can be computed exactly. Thus, the computational error results only
from representing the solution in a finitedimensional function space.
The method we present here can be considered a datadriven Galerkin
method of solving for where the trial “basis” is
given by the set of gridfunctions
evaluated at the data and only the testfunction basis
is known analytically. In this way, inner products appearing in
must be approximated numerically, implying that the accuracy of the
recovered weights is ultimately limited by
the quadrature scheme used to discretize inner products. Using Lemma
2 below, we show that the correct coefficients
may be recovered to effective machine precision accuracy (given by
the tolerance of the forward ODE solver) from noisefree trajectories
by discretizing (2.2) using the trapezoidal
rule and choosing to decay smoothly to zero at the boundaries
of its support. In this article we demonstrate this fact by choosing
test functions from a particular family of unimodal piecewise polynomials
defined in (2.6).
Having chosen a quadrature scheme, the next accuracy barrier is presented
by measurement noise, which introduces a bias in the weights. Below
we analyze the distribution of the residuals
to arrive at a generalized least squares approach where an approximate
covariance matrix can be computed directly from the test functions.
This analysis also shows that placing test functions near steep gradients
in the dynamics improves recovery, hence we develop a selfconsistent
and stable algorithm for constructing a test function basis adaptively
near these regions which also does not rely on pointwise approximation
of derivatives. Overall, we show that when noise is present, our method
produces a recovered weight matrix with the
number of significant digits scaling optimally with the signaltonoise
ratio (defined below).
Remark 1.
The weak formulation of the dynamics introduces a wealth of information: given timepoints , equation (2.2) affords residuals over all possible supports with . Of course, one could also assimilate the responses of multiple families of test functions ; however, the computational complexity of such an exhaustive approach quickly becomes intractable. We stress that even with large noise, our proposed method identifies the correct nonlinearities with accurate weight recovery while keeping the number of test functions much lower than the number of timepoints ().
2.2. Algorithm: Weak SINDy
We state here the Weak SINDy algorithm in full generality. We propose
a generalized least squares approach with approximate covariance matrix
. Below we derive a particular choice of which
utilizes the action of the test functions
on the data . Sequential thresholding on the weight coefficients
with thresholding parameter is used to enforce
sparsity. In addition, an regularization term with coefficient
is included for problems involving rank deficiency. Methods
of choosing optimal and are not included in this
study. We note that
is necessary for recovery and that with low noise our method is not
sensitive to . Throughout we mostly set , however
some problems do require regularization, such as the LotkaVolterra
system.
:

Construct matrix of trial gridfunctions

Construct integration matrices , such that

Compute Gram matrix and righthand side so that and

Solve the generalized leastsquares problem with regularization
using sequential thresholding with parameter to enforce sparsity.
With this as our core algorithm, we can now consider a residual analysis (Section 2.3) leading to a weighted least squares solution. We can also develop theoretical results related to the test functions (Section 2.4), yielding a more thorough understanding of the impact of using uniform (Section 2.4.1) and adaptive (Section 2.4.2) placement of test functions along the time axis.
2.3. Residual Analysis
Performance of WSINDy is determined by the behavior of the residuals
denoted for the
entire residual matrix. Here we analyze the residual to highlight
key aspects for future analysis, as well as to arrive at an appropriate
choice of approximate covariance . We also provide a heurisic
argument in favor of placing test functions near steep gradients in
the dynamics.
A key difficulty in recovering the true weights is that in general, due to nonlinearities present in , thus the recovered weights will be inherently biased. Nevertheless, we can isolate the dominant error terms by expanding out the residual and linearizing around the true trajectory :
where . The errors manifest in the following ways:

is the misfit between and

results from measurement error in trial gridfunctions

results from replacing with in the lefthand side of (2.2)

is the integration error

is the remainder term in the truncated Taylor expansion of around :
Clearly, recovery of when is straight
forward: and are the only error terms, thus one
only needs to select a quadrature scheme that ensures that the integration
error is negligible and
will be the minimizer. A primary focus of this study is the use of
a specific family of piecewise polynomial test functions
defined below for which the trapezoidal rule is highly accurate (see
Lemma 2). Figure 3.1 demonstrates this fact
on noisefree data.
For , accurate recovery of
requires one to choose hyperparameters that exemplify the true misfit term
by enforcing that the other error terms are of lower order. We look for and that approximately enforce , justifying the least squares approach. In the next subsection we address the issue of approximating the covariance matrix, providing justification for using. The following subsection provides a heuristic argument for how to reduce corruption from the error terms
and by placing test functions near steep gradients in the data.2.3.1. Approximate Covariance
Neglecting and , we can rewrite with and together as
where
is the linearized operator leftmultiplying the noise vector
at timestep and is the identity in . The true distribution of therefore depends on , which is not known a priori. For a leading order approximation we propose using which holds if . We then get that the columns of (corresponding to errors in each component along the time series) are approximately i.i.d normal with mean zero and covariance . For this reason, we adopt the heuristic , or .
2.3.2. Adaptive Refinement
Next we show that by localizing around large , we get an approximate cancellation of the error terms and . Consider the onedimensional case () where is an arbitrary time index and is an observation. When is large compared to , we approximately have
(2.4) 
for some small , i.e. the perturbed value lands close to the true trajectory at the time . To understand the heuristic behind this approximation, let be the point of intersection between the tangent line to at and . Then
hence implies that will approximately lie on the true trajectory. As well, regions where is small will not yield accurate recovery in the case of noisy data, since perturbations are more likely to exit the relevant region of phase space. If we linearize using the approximation (2.4) we get
(2.5) 
Assuming is sufficiently localized around , (2.4) also implies that
hence , while (2.5) implies
having integrated by parts. Collecing the terms together yields that the residual takes the form
and we see that and have effectively canceled. In higher dimensions this interpretation does not appear to be as illuminating, but nevertheless, for any given coordinate , it does hold that terms in the error expansion vanish around points where is large, precisely because .
2.4. Test Function Basis
Here we introduce a test function space and quadrature scheme to minimize integration errors and enact the heuristic arguments above, which rely on sufficiently localized and satisfying . We define the space of unimodal piecewise polynomials of the form
(2.6) 
where satisfies and . The normalization
ensures that . Functions
are nonnegative, unimodal, and compactly supported
in with continuous derivatives.
Larger and imply faster decay towards the endpoints of the
support.
To ensure the integration error in approximating inner products is negligible, we rely on the following lemma, which provides a bound on the error in discretizing the weak derivative relation
(2.7) 
using the trapezoidal rule for compactly supported . Following the lemma we introduce two strategies for choosing the parameters of the test functions .
Lemma 2 (Numerical Error in Weak Derivatives).
Let have continuous derivatives of order and define . Then if has roots of multiplicity , then
(2.8) 
where . In other words, the composite trapezoidal rule discretizes the weak derivative relation (2.7) to order .
Proof.
This is a simple consequence of the EulerMaclaurin formula. If is a smooth function, then
where are the Bernoulli numbers. The asymptotic expansion provides corrections to the trapezoidal rule that realize machine precision accuracy up until a certain value of , after which terms in the expansion grow and the series diverges [5, Ch. 3]. In our case, where the root conditions on imply that
So for odd, we have that
For even , the leading term is with a slightly different coefficient. ∎
For with , the exact leading order error in term in (2.8) is
which is negligible for a wide range of reasonable and values. The Bernoulli numbers eventually start growing like , but for smaller values of they are moderate. For instance, with and , this error term is up until , where it takes the value , while for , the error is below machine precision for all between 7 and 819. For these reasons, in what follows we choose test functions and discretize all integrals using the trapezoidal rule. Unless otherwise stated, each function satsifies and so is fully determined by the tuple indicating its polynomial degree and support. In the next two subsections we propose two different strategies for determining using the data .
2.4.1. Strategy 1: Uniform Grid
The simplest strategy for choosing a test function basis is to place uniformly on the interval with fixed degree and support size denoting the number of timepoints in that each is supported on. Specifically we propose the following three steps to automate this process, requiring only that the user specify the polynomial degree and the total number of basis functions , through the values of two hyperparameters and that relate to the residual analysis above.
Step 1: Choosing : We propose to fix the support size of each to be
(2.9) 
where is the magnitude of
the th Fourier mode of minus its mean. In this way
the action of the test functions exploits that large changes in the
dynamics are most likely to occur within time intervals of length
equal to the largest Fourier mode.
Step 2: Determining : In light of the derivation above of the approximate covariance matrix , we define , where larger indicates better agreement with . The value of selects the polynomial degree as follows: analytically,
As a leading order approximate we set .
Step 3: Determining : Next we introduce the shift parameter defined by
which determines from and . In words, is the height of intersection between and and measures the amount of overlap between successive test functions, which factors into the covariance matrix . This fixes for each pair of neighboring test functions. Larger implies that neighboring basis functions overlap on more points, with indicating that . Specifically, neighboring basis functions overlap on timepoints.
In Figures 3.2 and 3.3 we vary the parameters and and observe that results agree with intuition: larger and larger lead to better recovery of .
:

Construct matrix of trial gridfunctions

Construct integration matrices , such that
with the test functions determined by as described above

Compute Gram matrix and righthand side so that and

Compute approximate covariance and Cholesky factorization

Solve the generalized leastsquares problem with regularization
using sequential thresholding with parameter to enforce sparsity.
2.4.2. Strategy 2: Adaptive Grid
Motivated by the arguments above, we now introduce an algorithm for
constructing a test function basis localized near points of large
change in the dynamics. This occurs in three steps: 1) construct a
weak approximation to the derivative of the dynamics ,
2) sample points c from a cumulative distribution
with density proportional to the total variation ,
3) construct test functions centered at c using a widthathalfmax
parameter to determine the parameters
of each basis function . Each of these steps is numerically
stable and carried out independently along each coordinate of the
dynamics. A visual diagram is provided in Figure 2.1.
Step 1: Weak Derivative Approximation: Define , where the matrix enacts a linear convolution with the derivative of a piecewise polynomial test function of degree and support size so that
The parameters and are chosen by the user, with and corresponding to taking a centered finite difference derivative with 3point stencil. Larger results in more smoothing and minimizes the corruption from noise while still capturing the correct large deviations in the dynamics. For all examples we let and and note that greater disparity between and
results in more pronounced localization (less uniform distribution) of test functions.
Step 2: Selecting c: Having computed , define to be the cumulative sum of normalized so that . In this way
is a valid cumulative distribution function with density proportional to the total variation of
. We then find c by sampling form . Let with being the number of the test functions, we then define , or numerically,This stage requires the user to select the number of test functions
.
Step 3: Construction of Test functions :
Having chosen the location of the centerpoint for each test
function , we are left to choose the degree of
the polynomial and the supports . The degree is
chosen according to the widthathalfmax parameter , which
specifies the difference in timepoints between each center
and , while the supports are chosen
such that . This gives us a nonlinear
system of two equations in two unknowns which can be easily solved
(i.e. using MATLAB’s fzero). This can be done for one reference
test functions and the rest of the weights obtained by translation.
The adaptive grid Weak SINDy algorithm is summarized as follows:
:

Construct matrix of trial gridfunctions

Construct integration matrices , such that
with test functions determined by as described above

Compute Gram matrix and righthand side so that and

Compute approximate covariance and Cholesky factorization

Solve the generalized leastsquares problem with regularization
using sequential thresholding with parameter to enforce sparsity.
The parameters and play a role in determining how localized the test function basis is around steep gradients and ultimately depend on the timestep . As mentioned above, we set and throughout as this produces sufficient localization for the examples featured in this article. For simplicity we fix the number of test functions to be a multiple of the number of trial functions (i.e. etc.). For larger noise it is necessary to use a larger basis, while for small noise it as often sufficient to set . The optimal value of
depends on the timescales of the dynamics and can be chosen from the data using the Fourier transform as in the uniform grid case, however for simplicity we set
throughout.3. Numerical Experiments
We now show that WSINDy is capable of recovering the correct dynamics to high accuracy over a range of signaltonoise ratios. To generate true trajectory data we use MATLAB’s ode45 with absolute and relative tolerance e and sampling rate of unless otherwise specified. We choose a fixed sampling rate for simplicity and uniformity across examples but note that a detailed study of the dependency of the algorithm on is not presented here. While the results in this section hold for a wide range of (results not presented here), different sampling rates may result in different choices of hyperparameters (e.g., and in the case of WSINDy_UG). White Gaussian noise with mean zero and variance is then added to the exact trajectories, where is computed by specifying the signaltonoise ratio and setting
In this way, .
We examine the following canonical nonlinear systems with variations
in the specified parameters:
Duffing  
Van der Pol  
LotkaVolterra  
Lorenz 
The Duffing equation and Van der Pol oscillator present cases of approximately linear systems with cubic nonlinearities. Solutions to the Van der Pol oscillator and LotkaVolterra system exhibit orbits with variable speed of motion, in particular regions with rapid change between regions of very litte variation. For the Lorenz system, we focus on recovering the system in the chaotic regime. For this reason we fix the parameters of the differential equation to lie in the region with large Lyupanov exponents and vary the initial conditions. The initial conditions are chosen from a uniform distribution, and , which covers the strange attractor.
3.1. NoiseFree Data
The goal of the noisefree experiments is to examine the effect of increasing the polynomial degree of the test functions to show that convergence to machine precision is realized in the limit of large (i.e. convergence to within the accuracy tolerance of the ODE solver), as expected from Lemma 2. Indeed, Figure 3.1 shows that in the zeronoise case (), WSINDy recovers the correct weight matrix to within the tolerance of the ODE solver (fixed at ) over a wide range of parameter values for each of the dynamical systems above. We find that accurate recovery occurs regardless of sparsity enforcement or regularization, and so we set throughout, orders of magnitude below any of the true coefficients , and . For the datadriven trial basis , we include all polynomials up to degree 5 in the state variables as well as , for and . In addition, we find that recovery occurs with the minimal number of basis functions such that the Gram matrix is square. We use the uniform grid approach above with support selected from the Fourier transform of and shift parameter fixed to ensure that .
Duffing  Van der Pol 
LotkaVolterra  Lorenz 
V1  V2  V3  V4  Notes  
Duffing,  
Van der Pol,  
LotkaVolterra,  
Lorenz          , 
3.2. SmallNoise Regime
We now turn to the case of low to moderate noise levels, examining
a signaltonoise ratio in the range .
In Figures 3.2 and 3.3 we observe another
nice property of WSINDy, that the error in the coefficients scales
with , in that the recovered coefficients
have approximately significant digits.
We again use the uniform grid approach. We examine not only the polynomial
degree but the number of basis functions used in recovery.
To reiterate the arguments above, the magnitude of compared
to affects the distribution of the residual, so we define
and define the degree by fixing and then calculating
. In this way, increasing corresponds to increasing .
We look at which corresponds roughly to .
This together with the spacing parameter determines the test
function basis. We enforce that two neighboring basis functions
and intersect at a height of , so that with ,
the two functions perfectly overlap, and with their supports
are disjoint. In this way, larger corresponds to higher .
We examine . For (featured in both Figures) we
note that as is varied from to , the number of basis
functions ranges from to , or to .
In each case we set the sparsity and regularization parameters to
and and use a trial basis consisting
of all polynomials up to degree 5 in the state variables.
We simulated 200 instantiations of noise for the Duffing equation and Van der Pol oscillator and for each noisy trajectory examined a range of the parameter values and . As one might expect form the noisefree case above, increasing leads monotonically to better recovery. In addition, increasing
also leads to better recovery. The mean and standard deviation of the coefficient error
are also pictured, along with sample trajectories from the resulting datadriven dynamical systems, denoted by .




3.3. LargeNoise Regime
Figures 3.4 to 3.7 show that Strategy 2
(nonuniform grid) can be employed to discover the dynamics in the
large noise regime. The signal to noise ratio is
for the Duffing, Van der Pol and Lorenz equations, and
for LotkaVolterra. In each case we set the weak differentiation parameters
to and and the widthathalfmax to timepoints.
For the 2D systems, we use test basis functions, while
for the Lorenz equation were used. In each case the correct
terms were identified with relative coefficient error
less than , indicating approximately two significant digits.
We plot the noisy data , the true data
and the simulated datadriven dynamical systems
in dynamo view and phase space to exemplify the separation of scales
and the severity of the corruption from noise. We extend
by to show that the datadriven system captures the expected
limiting behavior. Unless otherwise specified, we set the sparsity
and regularization parameters to
and and use a trial basis consisting
of all polynomials up to degree 5 in the state variables.
For the Duffing equation (Figure 3.4), the datadriven trajectory diverges slightly from the true data as the system relaxes to equilibrium but is qualitatively correct. The recovered Van der Pol oscillator (Figure 3.5) identifies the correct limit cycle but the dominant timescale of is slightly shorter than the true data . For this reason slowly diverges pointwise over time but does not stray from the relevant regions of phase space. This reflects that more accurate measurements are needed to recover systems with abrupt changes. The recovered LotkaVolterra trajectory (Figure 3.6) is nearly indistiguishable from the true data, but we note that here regularization was used (), as the system was nearly singular, and the columns of were normalized using the 2norm. For the Lorenz attractor (Figure 3.7), the recovered trajectory remains close to the true trajectory up until around , after which the two diverge as is expected from chaotic dynamics. Nevertheless the Lorenz attractor is captured.


Comments
There are no comments yet.