# Weak SINDy: Galerkin-Based Data-Driven Model Selection

We present a weak formulation and discretization of the system discovery problem from noisy measurement data. This method of learning differential equations from data fits into a new class of algorithms that replace pointwise derivative approximations with linear transformations and a variance reduction technique. Our approach improves on the standard SINDy algorithm by orders of magnitude. We first show that in the noise-free regime, this so-called Weak SINDy (WSINDy) framework is capable of recovering the dynamic coefficients to very high accuracy, with the number of significant digits equal to the tolerance of the data simulation scheme. Next we show that the weak form naturally accounts for white noise by identifying the correct nonlinearities with coefficient error scaling favorably with the signal-to-noise ratio while significantly reducing the size of linear systems in the algorithm. In doing so, we combine the ease of implementation of the SINDy algorithm with the natural noise-reduction of integration to arrive at a more robust and user-friendly method of sparse recovery that correctly identifies systems in both small-noise and large-noise regimes.

## Authors

• 6 publications
• 6 publications
05/09/2020

### Weak SINDy: A Data-Driven Galerkin Method for System Identification

We present a weak formulation and discretization of the system discovery...
07/06/2020

### Weak SINDy For Partial Differential Equations

We extend the WSINDy (Weak SINDy) method of sparse recovery introduced p...
11/08/2021

### A toolkit for data-driven discovery of governing equations in high-noise regimes

We consider the data-driven discovery of governing equations from time-s...
07/14/2020

### The impact of signal-to-noise, redshift, and angular range on the bias of weak lensing 2-point functions

Weak lensing data follow a naturally skewed distribution, implying the d...
02/02/2021

### Robust data-driven discovery of partial differential equations with time-dependent coefficients

In this work, we propose a robust Bayesian sparse learning algorithm bas...
07/20/2020

### "Self-Wiener" Filtering: Non-Iterative Data-Driven Robust Deconvolution of Deterministic Signals

We consider the fundamental problem of robust deconvolution, and particu...
10/17/2021

### Exploitation of error correlation in a large analysis validation: GlobCurrent case study

An assessment of variance in ocean current signal and noise shared by in...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Problem Statement

Consider a first-order dynamical system in dimensions of the form

 ddtx(t)=F(x(t)),x(0)=x0∈RD,0≤t≤T, (1.1)

and measurement data given at timepoints by

 ymd=xd(tm)+ϵmd,m∈[M], d∈[D],

where throughout we use the bracket notation . The matrix represents i.i.d. measurement noise. The focus of this article is the reconstruction of the dynamics (1.1) from the measurements .

The SINDy algorithm (Sparse Identification of Nonlinear Dynamics [4]) has been shown to be successful in solving this problem for sparsely represented nonlinear dynamics when noise is small and dynamic scales do not vary across multiple orders of magnitude. This framework assumes that the function in (1.1) is given component-wise by

 Fd(x(t))=J∑j=1w⋆jdfj(x(t)) (1.2)

for some known family of functions and a sparse weight matrix . The problem is then transformed into solving for by building a data matrix given by

 Θ(y)mj=fj(ym),ym=(ym1,…,ymD),

so that the candidate functions are directly evaluated at the noisy data. Solving (1.1) for then reduces to solving

 ˙y=Θ(y)ˆw (1.3)

for a sparse weight matrix , where is the numerical time derivative of the data . Sequentially-thresholded least squares is then used to arrive at a sparse solution.

The automatic creation of an accurate mathematical model from data is a challenging task and research into statistically rigorous model selection can be traced back to Akaike’s seminal work in the 1970’s [1, 2]. In the last 20 years, there has been substantial work in this area at the interface between applied mathematics and statistics (see [3, 8, 9, 13, 15, 16] for both theory and applications). More recently, the formulation of system discovery problems in terms of a candidate basis of nonlinear functions (1.2) and subsequent discretization (1.3) was introduced in [14] in the context of catastrophe prediction. The authors of [14] used compressed sensing techniques to enforce sparsity. Since then there has been an explosion of interest in the problem of identifying nonlinear dynamical systems from data, with some of the primary techniques being Gaussian process regression [10]

, deep neural networks

[11][18, 19] and a variety of methods from numerical analysis [6, 7]

. These techniques have been successfully applied to discovery of both ordrinary and partial differential equations. The variety of approaches qualitatitively differ in the interpretability of the resulting data-driven dynamical system, the practicality of the algorithm, and the robustness due to noise, scale separation, etc. For instance, a neural-network based data-driven dynamical system does not easily lend itself to physical interpretation. As well, certain sparsification techniques are not practical to the general scientific community, where the problem of system identification from data is ubiquitous. The SINDy algorithm allows for direct interpretations of the dynamics from identified differential equations and uses sequentially thresholded least-squares to enforce sparsity, which is not nearly as robust as other approaches but is easy to implement and has been proven to converge to sparse local minimizers in

[17]. Therefore, for simplicity we use sequential thresholding in this article to demonstrate the viability of our proposed weak formulation. Naturally one could investigate using a more robust sparsification strategy.

The aim of the present article is to provide rigorous justification for using the weak formulation of the dynamics in place of local pointwise derative approximations, as well as a robust algorithm for doing so. As such, we restrict numerical experiments to autonomous ordinary differential equations for their immenability to analysis. Natural next steps are to explore identification of PDEs and non-autonomous dynamical systems. We note that the use of integral equations for system identification was introduced in

[12], where compressed sensing techniques were used to enforce sparsity, and that this technique can be seen as a special case of the method introduced here. In Section 2 we introduce the algorithm with analysis of the resulting error structure and in Section 3 we provide numerical experimentation with a range of nonlinear systems. In Section 4, we provide concluding remarks including a brief comparison between WSINDy and conventional SINDy as well as natural next directions for this line of research.

## 2. Weak SINDy

We approach the problem of system identification (1.3) from a non-standard perspective by utilizing the weak form of the differential equation. Recall that for any smooth test function (absolutely continuous is sufficient) and interval , equation (1.1) admits the weak formulation

 ϕ(b)x(b)−ϕ(a)x(a)−∫baϕ′(u)x(u)du=∫baϕ(u)F(x(u))du,0≤a

With , we arrive at the integral equation of the dynamics explored in [12]. If we instead take to be non-constant and compactly supported in , we arrive at

 −∫baϕ′(u)x(u)du=∫baϕ(u)F(x(u))du. (2.2)

We then define the generalized residual for a given test function by replacing with a candidate element from the span of and with as follows:

 R(w;ϕ):=∫ba(ϕ′(u)y(u)+ϕ(u)(J∑j=1wjfj(y(u))))du. (2.3)

Clearly with and we have for all compactly-supported in ; however, is a discrete set of data, hence (2.3) can at best be approximated numerically, with measurement noise presenting a significant barrier to accurate indentification of .

### 2.1. Method Overview

For analogy with traditional Galerkin methods, consider the forward problem of solving a dynamical system such as (1.1) for . The Galerkin approach is to seek a solution represented in a chosen trial basis such that the residual , defined by

 R=∫ϕ(t)(˙x(t)−F(x(t)))dt,

is minimized over all test functions living in the span of a given test function basis . If the trial and test function bases are known analytically, inner products of the form appearing in the residual can be computed exactly. Thus, the computational error results only from representing the solution in a finite-dimensional function space.

The method we present here can be considered a data-driven Galerkin method of solving for where the trial “basis” is given by the set of gridfunctions evaluated at the data and only the test-function basis is known analytically. In this way, inner products appearing in must be approximated numerically, implying that the accuracy of the recovered weights is ultimately limited by the quadrature scheme used to discretize inner products. Using Lemma 2 below, we show that the correct coefficients may be recovered to effective machine precision accuracy (given by the tolerance of the forward ODE solver) from noise-free trajectories by discretizing (2.2) using the trapezoidal rule and choosing to decay smoothly to zero at the boundaries of its support. In this article we demonstrate this fact by choosing test functions from a particular family of unimodal piece-wise polynomials defined in (2.6).

Having chosen a quadrature scheme, the next accuracy barrier is presented by measurement noise, which introduces a bias in the weights. Below we analyze the distribution of the residuals to arrive at a generalized least squares approach where an approximate covariance matrix can be computed directly from the test functions. This analysis also shows that placing test functions near steep gradients in the dynamics improves recovery, hence we develop a self-consistent and stable algorithm for constructing a test function basis adaptively near these regions which also does not rely on pointwise approximation of derivatives. Overall, we show that when noise is present, our method produces a recovered weight matrix with the number of significant digits scaling optimally with the signal-to-noise ratio (defined below).

###### Remark 1.

The weak formulation of the dynamics introduces a wealth of information: given timepoints , equation (2.2) affords residuals over all possible supports with . Of course, one could also assimilate the responses of multiple families of test functions ; however, the computational complexity of such an exhaustive approach quickly becomes intractable. We stress that even with large noise, our proposed method identifies the correct nonlinearities with accurate weight recovery while keeping the number of test functions much lower than the number of timepoints ().

### 2.2. Algorithm: Weak SINDy

We state here the Weak SINDy algorithm in full generality. We propose a generalized least squares approach with approximate covariance matrix . Below we derive a particular choice of which utilizes the action of the test functions on the data . Sequential thresholding on the weight coefficients with thresholding parameter is used to enforce sparsity. In addition, an -regularization term with coefficient is included for problems involving rank deficiency. Methods of choosing optimal and are not included in this study. We note that is necessary for recovery and that with low noise our method is not sensitive to . Throughout we mostly set , however some problems do require regularization, such as the Lotka-Volterra system.

:

1. Construct matrix of trial gridfunctions

2. Construct integration matrices , such that

 Vkm=Δtϕk(tm),V′km=Δtϕ′k(tm)
3. Compute Gram matrix and right-hand side so that and

4. Solve the generalized least-squares problem with -regularization

 ˆw=argminw{(Gw−b)TΣ−1(Gw−b)+γ2∥w∥22},

using sequential thresholding with parameter to enforce sparsity.

With this as our core algorithm, we can now consider a residual analysis (Section 2.3) leading to a weighted least squares solution. We can also develop theoretical results related to the test functions (Section 2.4), yielding a more thorough understanding of the impact of using uniform (Section 2.4.1) and adaptive (Section 2.4.2) placement of test functions along the time axis.

### 2.3. Residual Analysis

Performance of WSINDy is determined by the behavior of the residuals

 R(w;ϕk):=(Gw−b)k ∈ R1×D,

denoted for the entire residual matrix. Here we analyze the residual to highlight key aspects for future analysis, as well as to arrive at an appropriate choice of approximate covariance . We also provide a heurisic argument in favor of placing test functions near steep gradients in the dynamics.

A key difficulty in recovering the true weights is that in general, due to nonlinearities present in , thus the recovered weights will be inherently biased. Nevertheless, we can isolate the dominant error terms by expanding out the residual and linearizing around the true trajectory :

 R(w;ϕk) =⟨ϕk, Θ(y)w⟩+⟨ϕ′k, y⟩ =⟨ϕk, Θ(y)(w−w⋆)⟩+⟨ϕk, Θ(y)w⋆⟩+⟨ϕ′k, y⟩

where . The errors manifest in the following ways:

• is the misfit between and

• results from measurement error in trial gridfunctions

• results from replacing with in the left-hand side of (2.2)

• is the integration error

• is the remainder term in the truncated Taylor expansion of around :

 F(ym)=F(x(tm))+ϵm∇F(x(tm))+O(|ϵm|2).

Clearly, recovery of when is straight forward: and are the only error terms, thus one only needs to select a quadrature scheme that ensures that the integration error is negligible and will be the minimizer. A primary focus of this study is the use of a specific family of piecewise polynomial test functions defined below for which the trapezoidal rule is highly accurate (see Lemma 2). Figure 3.1 demonstrates this fact on noise-free data.

For , accurate recovery of

requires one to choose hyperparameters that exemplify the true misfit term

by enforcing that the other error terms are of lower order. We look for and that approximately enforce , justifying the least squares approach. In the next subsection we address the issue of approximating the covariance matrix, providing justification for using

. The following subsection provides a heuristic argument for how to reduce corruption from the error terms

and by placing test functions near steep gradients in the data.

#### 2.3.1. Approximate CovarianceΣ

Neglecting and , we can rewrite with and together as

 R(w;ϕk)=R1+Zk=R1+M∑m=1ϵmTmΔt

where

 Tm=ϕ′k(tm)ID+ϕk(tm)∇F(x(tm))

is the linearized operator left-multiplying the noise vector

 ϵm=(ϵ1m ϵ2m … ϵDm)

at timestep and is the identity in . The true distribution of therefore depends on , which is not known a priori. For a leading order approximation we propose using which holds if . We then get that the columns of (corresponding to errors in each component along the time series) are approximately i.i.d normal with mean zero and covariance . For this reason, we adopt the heuristic , or .

Next we show that by localizing around large , we get an approximate cancellation of the error terms and . Consider the one-dimensional case () where is an arbitrary time index and is an observation. When is large compared to , we approximately have

 ym=x(tm)+ϵm≈x(tm+δt)≈x(tm)+δtF(x(tm)) (2.4)

for some small , i.e. the perturbed value lands close to the true trajectory at the time . To understand the heuristic behind this approximation, let be the point of intersection between the tangent line to at and . Then

 δt=ϵ˙x(tm),

hence implies that will approximately lie on the true trajectory. As well, regions where is small will not yield accurate recovery in the case of noisy data, since perturbations are more likely to exit the relevant region of phase space. If we linearize using the approximation (2.4) we get

 (2.5)

Assuming is sufficiently localized around , (2.4) also implies that

hence , while (2.5) implies

 ⟨ϕk,Θ(y)w⟩ =⟨ϕk,Θ(y)(w−w⋆)⟩=R2+⟨ϕk,F(y)⟩ =⟨ϕk,Θ(y)(w−w⋆)⟩+⟨ϕk,F(x)⟩−δt⟨ϕ′k,F(x)⟩

having integrated by parts. Collecing the terms together yields that the residual takes the form

 R(w;ϕk)=⟨ϕ′k,y⟩+⟨ϕk,Θ(y)w⟩≈R2,

and we see that and have effectively canceled. In higher dimensions this interpretation does not appear to be as illuminating, but nevertheless, for any given coordinate , it does hold that terms in the error expansion vanish around points where is large, precisely because .

### 2.4. Test Function Basis (ϕk)k∈[K]

Here we introduce a test function space and quadrature scheme to minimize integration errors and enact the heuristic arguments above, which rely on sufficiently localized and satisfying . We define the space of unimodal piece-wise polynomials of the form

 ϕ(t)={C(t−a)p(b−t)qt∈(a,b),0otherwise, (2.6)

where satisfies and . The normalization

 C=1ppqq(p+qb−a)p+q

ensures that . Functions are non-negative, unimodal, and compactly supported in with continuous derivatives. Larger and imply faster decay towards the endpoints of the support.

To ensure the integration error in approximating inner products is negligible, we rely on the following lemma, which provides a bound on the error in discretizing the weak derivative relation

 −∫ϕ′fdt=∫ϕf′dt (2.7)

using the trapezoidal rule for compactly supported . Following the lemma we introduce two strategies for choosing the parameters of the test functions .

###### Lemma 2 (Numerical Error in Weak Derivatives).

Let have continuous derivatives of order and define . Then if has roots of multiplicity , then

 Δt2N−1∑j=0[g(tj)+g(tj+1)]=O(Δtp+1), (2.8)

where . In other words, the composite trapezoidal rule discretizes the weak derivative relation (2.7) to order .

###### Proof.

This is a simple consequence of the Euler-Maclaurin formula. If is a smooth function, then

 Δt2N−1∑j=0[g(tj)+g(tj+1)]∼∫bag(t)dt+∞∑k=1Δt2kB2k(2k)!(g(2k−1)(b)−g(2k−1)(a)),

where are the Bernoulli numbers. The asymptotic expansion provides corrections to the trapezoidal rule that realize machine precision accuracy up until a certain value of , after which terms in the expansion grow and the series diverges [5, Ch.  3]. In our case, where the root conditions on imply that

 ∫bag(t)dt=0andg(k)(b)=g(k)(a)=0,0≤k≤p−1.

So for odd, we have that

 Δt2N−1∑j=0[g(tj)+g(tj+1)] ∼∞∑k=(p+1)/2Δt2kB2k(2k)!(g(2k−1)(b)−g(2k−1)(a)) =Bp+1(p+1)!(ϕ(p)(b)f(b)−ϕ(p)(a)f(a))Δtp+1+O(Δtp+2).

For even , the leading term is with a slightly different coefficient. ∎

For with , the exact leading order error in term in (2.8) is

 2pBp+1p+1(f(b)−f(a))Δtp+1,

which is negligible for a wide range of reasonable and values. The Bernoulli numbers eventually start growing like , but for smaller values of they are moderate. For instance, with and , this error term is up until , where it takes the value , while for , the error is below machine precision for all between 7 and 819. For these reasons, in what follows we choose test functions and discretize all integrals using the trapezoidal rule. Unless otherwise stated, each function satsifies and so is fully determined by the tuple indicating its polynomial degree and support. In the next two subsections we propose two different strategies for determining using the data .

#### 2.4.1. Strategy 1: Uniform Grid

The simplest strategy for choosing a test function basis is to place uniformly on the interval with fixed degree and support size denoting the number of timepoints in that each is supported on. Specifically we propose the following three steps to automate this process, requiring only that the user specify the polynomial degree and the total number of basis functions , through the values of two hyperparameters and that relate to the residual analysis above.

Step 1: Choosing : We propose to fix the support size of each to be

 L=12(Margmaxn|Y|n) (2.9)

where is the magnitude of the th Fourier mode of minus its mean. In this way the action of the test functions exploits that large changes in the dynamics are most likely to occur within time intervals of length equal to the largest Fourier mode.

Step 2: Determining : In light of the derivation above of the approximate covariance matrix , we define , where larger indicates better agreement with . The value of selects the polynomial degree as follows: analytically,

 ρ=2√2p−1b−a(1−1/p1−1/2p)p−1=1.67…b−a(p1/2+p−1/2)+o(p−1/2).

As a leading order approximate we set .

Step 3: Determining : Next we introduce the shift parameter defined by

 s=ϕk(t∗)=ϕk+1(t∗),

which determines from and . In words, is the height of intersection between and and measures the amount of overlap between successive test functions, which factors into the covariance matrix . This fixes for each pair of neighboring test functions. Larger implies that neighboring basis functions overlap on more points, with indicating that . Specifically, neighboring basis functions overlap on timepoints.

In Figures 3.2 and 3.3 we vary the parameters and and observe that results agree with intuition: larger and larger lead to better recovery of .

:

1. Construct matrix of trial gridfunctions

2. Construct integration matrices , such that

 Vkm=Δtϕk(tm),V′km=Δtϕ′k(tm)

with the test functions determined by as described above

3. Compute Gram matrix and right-hand side so that and

4. Compute approximate covariance and Cholesky factorization

5. Solve the generalized least-squares problem with -regularization

 ˆw=argminw{(Gw−b)TΣ−1(Gw−b)+γ2∥w∥22},

using sequential thresholding with parameter to enforce sparsity.

#### 2.4.2. Strategy 2: Adaptive Grid

Motivated by the arguments above, we now introduce an algorithm for constructing a test function basis localized near points of large change in the dynamics. This occurs in three steps: 1) construct a weak approximation to the derivative of the dynamics , 2) sample points c from a cumulative distribution with density proportional to the total variation , 3) construct test functions centered at c using a width-at-half-max parameter to determine the parameters of each basis function . Each of these steps is numerically stable and carried out independently along each coordinate of the dynamics. A visual diagram is provided in Figure 2.1.

Step 1: Weak Derivative Approximation: Define , where the matrix enacts a linear convolution with the derivative of a piece-wise polynomial test function of degree and support size so that

 vm=−⟨ϕ′,y⟩=⟨ϕ,˙y⟩≈˙ym.

The parameters and are chosen by the user, with and corresponding to taking a centered finite difference derivative with 3-point stencil. Larger results in more smoothing and minimizes the corruption from noise while still capturing the correct large deviations in the dynamics. For all examples we let and and note that greater disparity between and

results in more pronounced localization (less uniform distribution) of test functions.

Step 2: Selecting c: Having computed , define to be the cumulative sum of normalized so that . In this way

is a valid cumulative distribution function with density proportional to the total variation of

. We then find c by sampling form . Let with being the number of the test functions, we then define , or numerically,

 ck=min{t∈t : ψ(t)≥Uk}.

This stage requires the user to select the number of test functions .

Step 3: Construction of Test functions : Having chosen the location of the centerpoint for each test function , we are left to choose the degree of the polynomial and the supports . The degree is chosen according to the width-at-half-max parameter , which specifies the difference in timepoints between each center and , while the supports are chosen such that . This gives us a nonlinear system of two equations in two unknowns which can be easily solved (i.e. using MATLAB’s fzero). This can be done for one reference test functions and the rest of the weights obtained by translation.

The adaptive grid Weak SINDy algorithm is summarized as follows:

:

1. Construct matrix of trial gridfunctions

2. Construct integration matrices , such that

 Vkm=Δtϕk(tm),V′km=Δtϕ′k(tm),

with test functions determined by as described above

3. Compute Gram matrix and right-hand side so that and

4. Compute approximate covariance and Cholesky factorization

5. Solve the generalized least-squares problem with -regularization

 ˆw=argminw{(Gw−b)TΣ−1(Gw−b)+γ2∥w∥22},

using sequential thresholding with parameter to enforce sparsity.

The parameters and play a role in determining how localized the test function basis is around steep gradients and ultimately depend on the timestep . As mentioned above, we set and throughout as this produces sufficient localization for the examples featured in this article. For simplicity we fix the number of test functions to be a multiple of the number of trial functions (i.e. etc.). For larger noise it is necessary to use a larger basis, while for small noise it as often sufficient to set . The optimal value of

depends on the timescales of the dynamics and can be chosen from the data using the Fourier transform as in the uniform grid case, however for simplicity we set

throughout.

## 3. Numerical Experiments

We now show that WSINDy is capable of recovering the correct dynamics to high accuracy over a range of signal-to-noise ratios. To generate true trajectory data we use MATLAB’s ode45 with absolute and relative tolerance e and sampling rate of unless otherwise specified. We choose a fixed sampling rate for simplicity and uniformity across examples but note that a detailed study of the dependency of the algorithm on is not presented here. While the results in this section hold for a wide range of (results not presented here), different sampling rates may result in different choices of hyperparameters (e.g., and in the case of WSINDy_UG). White Gaussian noise with mean zero and variance is then added to the exact trajectories, where is computed by specifying the signal-to-noise ratio and setting

In this way, . We examine the following canonical nonlinear systems with variations in the specified parameters:

 Duffing {˙x1=x2,˙x2=−μx2−αx1−βx31, variable β, fixed μ=0.2,α=0.05,x(0)=[02] Van der Pol {˙x1=x2,˙x2=βx2(1−x21)−x1, variable β, fixed x(0)=[01] Lotka-Volterra {˙x1=αx1−βx1x2,˙x2=βx1x2−2αx2, variable β, fixed α=1,x(0)=[12] Lorenz ⎧⎨⎩˙x1=σ(x2−x1),˙x2=x1(ρ−x3)−x2,˙x3=x1x2−βx3, variable x(0), fixed σ=10,β=8/3,ρ=28

The Duffing equation and Van der Pol oscillator present cases of approximately linear systems with cubic nonlinearities. Solutions to the Van der Pol oscillator and Lotka-Volterra system exhibit orbits with variable speed of motion, in particular regions with rapid change between regions of very litte variation. For the Lorenz system, we focus on recovering the system in the chaotic regime. For this reason we fix the parameters of the differential equation to lie in the region with large Lyupanov exponents and vary the initial conditions. The initial conditions are chosen from a uniform distribution, and , which covers the strange attractor.

### 3.1. Noise-Free Data

The goal of the noise-free experiments is to examine the effect of increasing the polynomial degree of the test functions to show that convergence to machine precision is realized in the limit of large (i.e. convergence to within the accuracy tolerance of the ODE solver), as expected from Lemma 2. Indeed, Figure 3.1 shows that in the zero-noise case (), WSINDy recovers the correct weight matrix to within the tolerance of the ODE solver (fixed at ) over a wide range of parameter values for each of the dynamical systems above. We find that accurate recovery occurs regardless of sparsity enforcement or regularization, and so we set throughout, orders of magnitude below any of the true coefficients , and . For the data-driven trial basis , we include all polynomials up to degree 5 in the state variables as well as , for and . In addition, we find that recovery occurs with the minimal number of basis functions such that the Gram matrix is square. We use the uniform grid approach above with support selected from the Fourier transform of and shift parameter fixed to ensure that .

### 3.2. Small-Noise Regime

We now turn to the case of low to moderate noise levels, examining a signal-to-noise ratio in the range . In Figures 3.2 and 3.3 we observe another nice property of WSINDy, that the error in the coefficients scales with , in that the recovered coefficients have approximately significant digits.

We again use the uniform grid approach. We examine not only the polynomial degree but the number of basis functions used in recovery. To reiterate the arguments above, the magnitude of compared to affects the distribution of the residual, so we define and define the degree by fixing and then calculating . In this way, increasing corresponds to increasing . We look at which corresponds roughly to . This together with the spacing parameter determines the test function basis. We enforce that two neighboring basis functions and intersect at a height of , so that with , the two functions perfectly overlap, and with their supports are disjoint. In this way, larger corresponds to higher . We examine . For (featured in both Figures) we note that as is varied from to , the number of basis functions ranges from to , or to . In each case we set the sparsity and regularization parameters to and and use a trial basis consisting of all polynomials up to degree 5 in the state variables.

We simulated 200 instantiations of noise for the Duffing equation and Van der Pol oscillator and for each noisy trajectory examined a range of the parameter values and . As one might expect form the noise-free case above, increasing leads monotonically to better recovery. In addition, increasing

also leads to better recovery. The mean and standard deviation of the coefficient error

are also pictured, along with sample trajectories from the resulting data-driven dynamical systems, denoted by .

### 3.3. Large-Noise Regime

Figures 3.4 to 3.7 show that Strategy 2 (non-uniform grid) can be employed to discover the dynamics in the large noise regime. The signal to noise ratio is for the Duffing, Van der Pol and Lorenz equations, and for Lotka-Volterra. In each case we set the weak differentiation parameters to and and the width-at-half-max to timepoints. For the 2D systems, we use test basis functions, while for the Lorenz equation were used. In each case the correct terms were identified with relative coefficient error less than , indicating approximately two significant digits. We plot the noisy data , the true data and the simulated data-driven dynamical systems in dynamo view and phase space to exemplify the separation of scales and the severity of the corruption from noise. We extend by to show that the data-driven system captures the expected limiting behavior. Unless otherwise specified, we set the sparsity and regularization parameters to and and use a trial basis consisting of all polynomials up to degree 5 in the state variables.

For the Duffing equation (Figure 3.4), the data-driven trajectory diverges slightly from the true data as the system relaxes to equilibrium but is qualitatively correct. The recovered Van der Pol oscillator (Figure 3.5) identifies the correct limit cycle but the dominant timescale of is slightly shorter than the true data . For this reason slowly diverges pointwise over time but does not stray from the relevant regions of phase space. This reflects that more accurate measurements are needed to recover systems with abrupt changes. The recovered Lotka-Volterra trajectory (Figure 3.6) is nearly indistiguishable from the true data, but we note that here regularization was used (), as the system was nearly singular, and the columns of were normalized using the 2-norm. For the Lorenz attractor (Figure 3.7), the recovered trajectory remains close to the true trajectory up until around , after which the two diverge as is expected from chaotic dynamics. Nevertheless the Lorenz attractor is captured.