Stability selection enables robust learning of partial differential equations from limited noisy data

07/17/2019
by   Suryanarayana Maddu, et al.
0

We present a statistical learning framework for robust identification of partial differential equations from noisy spatiotemporal data. Extending previous sparse regression approaches for inferring PDE models from simulated data, we address key issues that have thus far limited the application of these methods to noisy experimental data, namely their robustness against noise and the need for manual parameter tuning. We address both points by proposing a stability-based model selection scheme to determine the level of regularization required for reproducible recovery of the underlying PDE. This avoids manual parameter tuning and provides a principled way to improve the method's robustness against noise in the data. Our stability selection approach, termed PDE-STRIDE, can be combined with any sparsity-promoting penalized regression model and provides an interpretable criterion for model component importance. We show that in particular the combination of stability selection with the iterative hard-thresholding algorithm from compressed sensing provides a fast, parameter-free, and robust computational framework for PDE inference that outperforms previous algorithmic approaches with respect to recovery accuracy, amount of data required, and robustness to noise. We illustrate the performance of our approach on a wide range of noise-corrupted simulated benchmark problems, including 1D Burgers, 2D vorticity-transport, and 3D reaction-diffusion problems. We demonstrate the practical applicability of our method on real-world data by considering a purely data-driven re-evaluation of the advective triggering hypothesis for an embryonic polarization system in C. elegans. Using fluorescence microscopy images of C. elegans zygotes as input data, our framework is able to recover the PDE model for the regulatory reaction-diffusion-flow network of the associated proteins.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 9

page 11

page 13

page 16

11/12/2021

PySINDy: A comprehensive Python package for robust sparse system identification

Automated data-driven modeling, the process of directly discovering the ...
01/31/2021

Robust Data-Driven Discovery of Partial Differential Equations under Uncertainties

Robust physics (e.g., governing equations and laws) discovery is of grea...
10/22/2019

Learning Partial Differential Equations from Data Using Neural Networks

We develop a framework for estimating unknown partial differential equat...
04/05/2020

SINDy-PI: A Robust Algorithm for Parallel Implicit Sparse Identification of Nonlinear Dynamics

Accurately modeling the nonlinear dynamics of a system from measurement ...
07/06/2020

Weak SINDy For Partial Differential Equations

We extend the WSINDy (Weak SINDy) method of sparse recovery introduced p...
04/27/2021

Stochastic partial differential equations arising in self-organized criticality

Scaling limits for the weakly driven Zhang and the Bak-Tang-Wiesenfeld (...
03/20/2020

Dimensionally Consistent Preconditioning for Saddle-Point Problems

The preconditioned iterative solution of large-scale saddle-point system...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Predictive mathematical models, derived from first principles and symmetry arguments and validated in experiments, are of key importance for the scientific understanding of natural phenomena. While this approach has been particularly successful in describing spatiotemporal dynamical systems in physics and engineering, it has not seen the same degree of success in other scientific fields, such as neuroscience, biology, finance, and ecology. This is because the underlying first principles in these areas remain largely elusive. Nevertheless, modeling in those areas has seen increasing use and relevance to help formulate simplified mathematical equations where sufficient observational data are available for validation Mogilner et al. (2006); Sbalzarini (2013); Tomlin and Axelrod (2007); Duffy (2013); Adomian (1995). In biology, modern high-throughput technologies have enabled collection of large-scale data sets, ranging from genomics, proteomics, and metabolomics data, to microscopy images and videos of cells and tissues. These data sets are routinely used to infer parameters in hypothesized models, or to perform model selection among a finite number of alternative hypotheses Donoho (2017); Barnes et al. (2011); Asmus et al. (2017). The amount and quality of biological data, as well as the performance of computing hardware and computational methods have now reached a level that promises direct inference of mathematical models of biological processes from the available experimental data. Such data-driven approaches seem particularly valuable in cell and developmental biology, where first principles are hard to come by, but large-scale imaging data are available, along with an accepted consensus of which phenomena a model could possibly entail. In such scenarios, data-driven modeling approaches have the potential to uncover the unknown first principles underlying the observed biological dynamics.

Biological dynamics can be formalized at different scales, from discrete molecular processes to the continuum mechanics of tissues. Here, we consider the macroscopic, continuum scale where spatiotemporal dynamics are modeled by Partial Differential Equations (PDEs) over coarse-grained state variables Kitano (2002); Tomlin and Axelrod (2007). PDE models have been used to successfully address a range of biological problems from embryo patterning Gregor et al. (2005) to modeling gene-expression networks Chen et al. (1999); Ay and Arnosti (2011) to predictive models of cell and tissue mechanics during growth and development Prost et al. (2015). They have shown their potential to recapitulate experimental observables in cases where the underlying physical phenomena are known or have been postulated Münster et al. (2019). In many biological systems, however, the governing PDEs are not (yet) known, which limits progress in discovering the underlying physical principles. Thus it is desirable to verify existing models or even discover new models by extracting governing laws directly from measured spatiotemporal data.

For given observable spatiotemporal dynamics, with no governing PDE known, several proposals have been put forward to learn mathematically and physically interpretable PDE models. The earliest work in this direction Voss et al. (1998) frames the problem of “PDE learning” as a multivariate nonlinear regression problem where each component in the design matrix consists of space and time differential operators and low-order non-linearities computed directly from data. Then, the alternating conditional expectation (ACE) algorithm Breiman and Friedman (1985)

is used to compute both optimal element-wise non-linear transformations of each component and their associated coefficients. In

Bär et al. (1999)

, the problem is formulated as a linear regression problem with a fixed pre-defined set of space and time differential operators and polynomial transformations that are computed directly from data. Then, backward elimination is used to identify a compact set of PDE components by minimizing the least square error of the full model and then removing terms that reduce the fit the least. In the statistical literature

Xun et al. (2013); Raissi et al. (2017)

, the PDE learning problem is formulated as a Bayesian non-parametric estimation problem where the observed dynamics are learned via non-parametric approximation, and a PDE representation serves as a prior distribution to compute the posterior estimates of the PDE coefficients. Recent influential work revived the idea of jointly learning the structure

and the coefficients of PDE models from data in discrete space and time using sparse regression  Brunton et al. (2016); Rudy et al. (2017); Schaeffer (2017). Approaches such as SINDy Brunton et al. (2016) and PDE-FIND Rudy et al. (2017) compute a large pre-assembled dictionary of possible PDE terms from data and identify the most promising components via penalized linear regression formulations. For instance, PDE-FIND is able to learn different types of PDEs from simulated spatiotemporal data, including Burgers, Kuramato-Sivishinksy, reaction-diffusion, and Navier-Stokes equations. PDE-FIND’s performance was evaluated on noise-free simulated data as well as data with up to 1% additive noise and showed a critical dependency on the proper setting of the tuning parameters which are typically unknown in practice. Recent approaches attempt to alleviate this dependence by using Bayesian sparse regression schemes for model uncertainty quantification Zhang and Lin (2018) or information criteria for tuning parameter selection Mangan et al. (2019)

. There is also a growing body of work that considers deep neural networks for PDE learning

Long et al. (2017); Raissi and Karniadakis (2018); Raissi et al. (2019). For instance, a deep feed forward network formulation Long et al. (2017), PDE-NET, directly learns a computable discretized form of the underlying governing PDEs for forecasting  Long et al. (2017, 2018). PDE-NET exploits the connection between differential operators and orders-of-sum rules of convolution filters Dong et al. (2017) in order to constrain network layers to learn valid discretized differential operators. The forecasting capability of this approach was numerically demonstrated for predefined linear differential operator templates. A compact and interpretable symbolic identification of the PDE structure is, however, not available with this approach.

Here, we ask the question whether and how it is possible to extend the class of sparse regression inference methods to work on real, limited, and noisy experimental data. As a first step, we present a statistical learning framework, PDE-STRIDE (STability-based Robust IDEntification of PDEs), to robustly infer PDE models from noisy spatiotemporal data without requiring manual tuning of learning parameters, such as regularization constants. PDE-STRIDE is based on the statistical principle of stability selection Meinshausen and Bühlmann (2010); Shah and Samworth (2013), which provides an interpretable criterion for any component’s inclusion in the learned PDE in a data-driven manner. Stability selection can be combined with any sparsity-promoting regression method, including LASSO Tibshirani (1996); Meinshausen and Bühlmann (2010), iterative hard thresholding (IHT) Blumensath and Davies (2008), Hard Thresholding Pursuit (HTP) Foucart (2011)

, or Sequential Thresholding Ridge Regression (STRidge) 

Rudy et al. (2017). PDE-STRIDE therefore provides a drop-in solution to render existing inference tools more robust, while reducing the need for parameter tuning. In the benchmarks presented herein, the combination of stability selection with de-biased iterative hard thresholding (IHT-d) empirically showed the best performance and highest consistency w.r.t. perturbations of the dictionary matrix and sampling of the data.

This paper is organized as follows: Section 2 provides the mathematical formulation of the sparse regression problem and discusses how the design matrix is assembled. We also review the concepts of regularization paths and stability selection and discuss how they are combined in the proposed method. The numerical results in Section 3

highlight the performance and robustness of the PDE-STRIDE for recovering different PDEs from noise-corrupted simulated data. We also perform an achievability analysis of PDE-STRIDE+IHT-d inference scheme for consistency and convergence of recovery probability with increasing sample size. Section 

4 demonstrates that the robustness of the proposed method is sufficient for real-world applications. We consider learning a PDE model from noisy biological microscopy images of membrane protein dynamics in a C. elegans zygote. Section 5 provides a summary of our results and highlights future challenges for data-driven PDE learning.

Figure 1: Enabling data-driven mathematical model discovery through stability selection: We outline the necessary steps in our method for learning PDE models from spatiotemporal data. A: Extract spatiotemporal profiles from microscopy videos of the chosen state variables. Data courtesy of Grill Lab, MPI-CBG/TU Dresden Gross et al. (2019). B: Compile the design matrix

and the measurement vector

from the data. C: Construct multiple linear systems of reduced size through random sub-sampling of the rows of the design matrix and . D: Solve and record the sparse/penalized regression solutions independently for each sub-sample along the paths. E: Compute the importance measure for each component. The histogram shows the importance measure for all components at a particular value of . F: Construct the stability plot by aggregating the importance measures along the path, leading to separation of the noise variables (dashed black) from the stable components (colored). Identify the most stable components by thresholding . G: Build the PDE model from the identified components.

2 Problem Formulation and Optimization

We outline the problem formulation underlying the data-driven PDE inference considered here. We review important sparse regression techniques and introduce the concept of stability selection used in PDE-STRIDE.

2.1 Problem formulation for PDE learning

We propose a framework for stable estimation of the structure and parameters of the governing equations of continuous dynamical systems from discrete spatiotemporal measurements or observations. Specifically, we consider PDEs for the multidimensional state variable of the form shown in Eq. (2.1.1), composed of polynomial non-linearities (e.g., ), spatial derivatives (e.g., ), and the parametric dependence modeled through .

(2.1.1)

Here, is the function map that models the spatiotemporal non-linear dynamics of the system. For simplicity, we limit ourselves to forms of the function map that can be written as the linear combinations of polynomial non-linearities, spatial derivatives, and combinations of both. For instance, for a one-dimensional () state variable , the function map can take the form:

(2.1.2)

where are the coefficients of the components of the PDE for . The continuous PDE of the form described in Eq. (2.1.2), with appropriate coefficients , holds true for all continuous space and time points in the domain of the model. Numerical solutions of the PDE try to satisfy the equality relation in Eq. (2.1.2) for reconstituting the non-linear dynamics of a dynamical system at discrete space and time points . We assume that we have access to noisy observational data of the state variable

over the discrete space and time points. The measurement errors are independent and identically distributed and are assumed to follow a normal distribution with mean zero and variance

.

We follow the formulation put forward in Bär et al. (1999); Rudy et al. (2017); Schaeffer (2017) and construct a large dictionary of PDE components using discrete approximations of the dynamics from data. For instance, for the one-dimensional example in Eq. (2.1.2), the discrete approximation with PDE terms can be written in vectorized form as a linear regression problem:

(2.1.3)

Here, the left-hand side is a discrete approximation to the time derivatives of at the discretization points and represents the response or outcome vector in the linear regression framework. Each column of the dictionary or design matrix represents the discrete approximation of one PDE component, i.e., one of the terms in Eq. (2.1.2), at discrete points in space and time . Each column is interpreted as a potential predictor of the response vector . The vector is the vector of unknown PDE coefficients, i.e., the pre-factors of the terms in Eq. (2.1.2).

Both and need to be constructed from numerical approximations of the temporal and spatial derivatives of the observed state variables. There is a rich literature in numerical analysis on this topic (see, e.g.,Chartrand (2011); Stickel (2010)). Here, we approximate the time derivatives by first-order forward finite differences from after initial denoising of the data. Similarly, the spatial derivatives are computed by applying the second-order central finite differences. Details about the denoising are given in Section 3 and the Supplemental Material.

Given the general linear regression ansatz in Eq. 2.1.3 we formulate the data-driven PDE inference problem as a regularized optimization problem of the form:

(2.1.4)

where is the minimizer of the objective function, a smooth convex data fitting function, a regularization or penalty function, and is a scalar tuning parameter that balances data fitting and regularization. The function is not necessarily convex or differentiable. We follow previous work Bär et al. (1999); Rudy et al. (2017); Schaeffer (2017) and consider the standard least-squares objective

(2.1.5)

The choice of the penalty function influences the properties of the coefficient estimates . Following Brunton et al. (2016); Rudy et al. (2017); Schaeffer (2017), we seek to identify a small set of PDE components among the possible components that accurately predict the time evolution of the state variables. This implies that we want to identify a sparse coefficient vector , thus resulting in an interpretable PDE model. This can be achieved through sparsity-promoting penalty functions . We next consider different choices for that enforce sparsity in the coefficient vector and review algorithms that solve the associated optimization problems.

2.2 Sparse optimization for PDE learning

The least-squares loss in Eq.(2.1.5) can be combined with different sparsity-inducing penalty functions . The prototypical example is the -norm leading to the LASSO formulation of sparse linear regression Tibshirani (1996):

(2.2.1)

The LASSO objective comprises a convex smooth loss and a convex non-smooth regularizer. For this class of problems, efficient optimization algorithms exist that can exploit the properties of the functions and come with convergence guarantees. Important examples include coordinate-descent algorithms Wu and Lange (2008); Friedman et al. (2010) and proximal algorithms, including the Douglas-Rachford algorithm Eckstein and Bertsekas (1992) and the projected (or proximal) gradient method, also known as the Forward-Backward algorithm Combettes and Pesquet (2011). In signal processing, the latter schemes are termed iterative shrinkage-thresholding (ISTA) algorithms (see Beck and Teboulle (2009) and references therein) which can be extended to non-convex penalties. Although the LASSO has been previously used for PDE learning in Schaeffer (2017), the statistical performance of the LASSO estimates are known to deteriorate if certain conditions on the design matrix are not met. For example, the studies in Meinshausen et al. (2006); Zhao and Yu (2006) provide sufficient and necessary conditions, called the irrepresentable conditions, for consistent variable selection using LASSO, essentially excluding too strong correlations of the predictors in the design matrix. The conditions are, however, difficult to check in practice, as they require knowledge of the true components of the model. One way to relax these conditions is via randomization. The Randomized LASSO Meinshausen and Bühlmann (2010) considers the following objective:

(2.2.2)

where each is an i.i.d.

 random variable uniformly distributed over

with . For , the Randomized LASSO reduced to the standard LASSO. The Randomized LASSO has been shown to successfully overcome limitations of the LASSO to handle correlated components in the dictionary Meinshausen and Bühlmann (2010) while simultaneously preserving the overall convexity of objective function. As part of our PDE-STRIDE framework, we will evaluate the performance of the Randomized Lasso in the context of PDE learning using cyclical coordinate descent Friedman et al. (2010).

The sparsity-promoting property of the (weighted) -norm comes at the expense of considerable bias in the estimation of the non-zero coefficients Zhao and Yu (2006), thus leading to reduced variable selection performance in practice. This drawback can be alleviated by using non-convex penalty functions Fan and Li (2001); Zhang (2010), allowing near-optimal variable selection performance at the cost of needing to solve a non-convex optimization problem. For instance, using the -“norm” (which counts the number of non-zero elements of a vector) as regularizers leads to the NP-hard problem:

(2.2.3)

This formulation has found widespread applications in compressive sensing and signal processing. Algorithms that deliver approximate solutions to the objective function in Eq. 2.2.3 include greedy optimization strategies Tropp (2004), Compressed Sampling Matching Pursuit (CoSaMP) Needell and Tropp (2009), and subspace pursuit Dai and Milenkovic (2009). We here consider the Iterative Hard Thresholding (IHT) algorithm Blumensath and Davies (2008, 2009), which belongs to the class of ISTA algorithms. Given the design matrix and the measurement vector , IHT computes sparse solutions by applying a non-linear shrinkage (thresholding) operator to gradient descent steps in an iterative manner. One step in the iterative scheme reads:

(2.2.4)

The operator is the non-linear hard-thresholding operator. Convergence of the above IHT algorithm is guaranteed iff is true in each iteration for some constant . Specifically, under the condition that , the IHT algorithm is guaranteed to not increase the cost function in Eq. (2.2.3) (Lemma 1 in Blumensath and Davies (2008)). The above IHT algorithm can also be viewed as a thresholded version of the classical Landweber iteration Herrity et al. (2006). The fixed points of for the non-linear operator in Eq. (2.2.4) are local minima of the cost function in Eq. (2.2.3) (Lemma 3 in Blumensath and Davies (2008)). Under the same condition on the design matrix, i.e. , the optimal solution of the cost function in Eq. (2.2.3) thus belongs to the set of fixed points of the IHT algorithm (Theorem 2 in Blumensath and Davies (2008) and Theorem 12 in Tropp (2006)). Although the IHT algorithm comes with theoretical convergence guarantees, the resulting fixed points are not necessarily sparse Blumensath and Davies (2008).

Here, we propose modification of the IHT algorithm that will prove to be particularly suited for solving PDE learning problems with PDE-STRIDE. Following a proposal in Foucart (2011) for the Hard Thresholding Pursuit (HTP) algorithm, we equip the standard IHT algorithm with an additional debiasing step. At each iteration, we solve a least-squares problem restricted to the support obtained from the IHT iteration. We refer to this form of IHT as Iterative Hard Thresholding with debiasing (IHT-d). In this two-stage algorithm, the standard IHT step serves to extract the explanatory variables, while the debiasing step approximately debiases (or refits) the coefficients restricted to the currently active support Figueiredo et al. (2007)

. Rather than solving the least-squares problem to optimality, we use gradient descent steps until a loose upper bound on the least-squares refit is satisfied. This prevents over-fitting by attributing low confidence to large supports and reduces computational overhead. The complete IHT-d procedure is detailed in Algorithm 1 in the Supplementary material. In PDE-STRIDE, we will compare IHT-d with a heuristic iterative algorithm, Sequential Thresholding of Ridge regression (STRidge), that also uses

penalization and is available in PDE-FIND Rudy et al. (2017).

2.3 Stability selection

The practical performance of the sparse optimization techniques for PDE learning critically hinges on the proper selection of the regularization parameter that balances model fit and complexity of the coefficient vector. In model discovery tasks on real experimental data, a wrong choice of the regularization parameter could result in incorrect PDE model selection even if true model discovery would have been, in principle, achievable. In statistics, a large number of tuning parameter selection criteria are available, ranging from cross-validation approaches Kohavi (1995) to information criteria Schwarz (1978), or formulations that allow joint learning of model coefficients and tuning parameters Lederer and Müller (2015); Bien et al. (2018). Here, we advocate stability-based model selection Meinshausen and Bühlmann (2010) for robust PDE learning. The statistical principle of stability Yu (2013)

has been put forward as one of the pillars of modern data science and statistics and provides an intuitive approach to model selection

Meinshausen and Bühlmann (2010); Shah and Samworth (2013); Liu et al. (2010).

In the context of sparse regression, stability selection Meinshausen and Bühlmann (2010) proceeds as follows (see also Figure 1 for an illustration). Given a design matrix and the measurement vector , we generate random subsample indices of equal size and produce reduced sub-designs and by choosing rows according to the index set . For each of the resulting subproblems, we apply a sparse regression technique and systematically record the recovered support as a function of over an regularization path . The values of are data dependent and are easily computable for generalized linear models with convex penalties Friedman et al. (2010). Similarly, the critical parameter for the non-convex problem in Eq.(2.2.3) can be evaluated from optimality conditions (Theorem 12 in Tropp (2006) and Theorem 1 in Blumensath and Davies (2008)). The lower bound of the regularization path is set to with default value . The -dependent stability (or importance) measure for each coefficient is then computed as:

(2.3.1)

where indicates the independent random sub-samples. The stability of each coefficient can be plotted across the -path, resulting in a component stability profile (see Figure 1F for an illustration). This visualization provides an intuitive overview of the importance of the different model components. Different from the original stability selection proposal Meinshausen and Bühlmann (2010), we define the stable components of the model as follows:

(2.3.2)

Here, denotes the critical stability threshold parameter and can be set to Meinshausen and Bühlmann (2010). The default setting is . In an exploratory data analysis mode, the threshold can also be set through visual inspection of the stability plots, thereby allowing the principled exploration of several alternative PDE models. The stability measures also provide an interpretable criterion for a model component’s importance, thereby guiding the user to build the right model with high probability. As we will show in the numerical experiments, stability selection ensures robustness against varying dictionary size, different types of data sampling, noise in the data, and variability of the sub-optimal solutions when non-convex penalties are used. All of these properties are critical for consistent and reproducible model learning in real-world applications. Under certain conditions, stability selection can also provides an upper bound on the expected number of false positives. Such guarantees are not generally assured by any sparsity-promoting regression method in isolation Shah and Samworth (2013). For instance, stability selection combined with randomized LASSO (Eq. (2.2.2) with ) is consistent for variable selection even when the irrepresentable condition is violated Meinshausen and Bühlmann (2010).

3 Numerical experiments on simulation data

We present numerical experiments in order to benchmark the performance and robustness of PDE-STRIDE combined with different sparsity-promoting regression methods to infer PDEs from spatiotemporal data. In order to provide comparisons and benchmarks, we first use simulated data obtained by numerically solving known ground-truth PDEs, before applying our method to a real-world data set from biology. The benchmark experiments on simulation data are presented in four sub-sections that demonstrate different aspects of the inference framework: Sub-section 3.1 demonstrates the use of different sparsity-promoting regression methods in our framework in a simple 1D Burgers problem. Sub-section 3.2 then compares their performance in order to choose the best regression method, IHT-d. In sub-section 3.3, stability selection is combined with IHT-d to recover 2D vorticity-transport and 3D reaction-diffusion PDEs from limited, noisy simulation data. Sub-section 3.4 reports achievability results to quantify the robustness of stability selection to perturbations in dictionary size, sample size, and noise levels.

STability-based Robust IDEntification of PDEs (PDE-STRIDE)
Given the noise-corrupted data and a choice of regression method, e.g., (randomized) LASSO, IHT, HTP, IHT-d, STRidge.

  1. Apply any required de-noising method on the noisy data and compute the spatial derivatives and non-linearities to construct the design matrix and the time-derivatives vector for suitable sample size and dictionary size, and , respectively.

  2. Build the sub-samples and , for , by uniformly randomly sub-sampling of rows from the design matrix and the corresponding rows from . For every sub-sample , standardize the sub-design matrix such that and , for . Here, is the element in row and column of the matrix . The corresponding measurement vector is centered to zero mean.

  3. Apply the sparsity-promoting regression method independently to each sub-sample to construct the paths for values of as discussed in section 2.3.

  4. Compute the importance measures of all dictionary components along the discretized path, as discussed in section 2.3. Select the stable support set by applying the threshold to all . Solve a linear least-squares problem restricted to the support to identify the coefficients of the learned model.

Adding noise to the simulation data

Let be the vector of clean simulation data sampled in both space and time. This vector is corrupted with additive Gaussian noise to

such that

is the additive Gaussian noise with an empirical standard deviation of the entries in the vector

, and is the level of Gaussian noise added.

Computing the data vector

The data vector contains numerical approximations to the time derivatives of the state variable at different points in space and time. We compute these by first-order forward finite differences (i.e., the explicit Euler scheme) from

after initial de-noising of the data. Similarly, the spatial derivatives are computed by applying the second-order central finite differences directly on the de-noised data. For de-noising we use truncated single value decomposition (SVD) with a cut-off at the elbow of the singular values curve, as shown in Supplementary Figures 

S2 and S3.

Fixing the parameters for stability selection

We propose that PDE-STRIDE combined with IHT-d provides a parameter-free PDE learning method. Therefore, all stability selection parameters described in Section 2.3 are fixed throughout our numerical experiments. The choice of these statistical parameters is well-discussed in the literature Meinshausen and Bühlmann (2010); Bühlmann et al. (2014); Friedman et al. (2010). We thus fix: the repetition number , regularization path parameter , -path size , and the importance threshold . Using these standard choices, the methods works robustly across all tests presented hereafter, and is parameter-free in that sense. In both stability and regularization plots, we show the normalized value of regularization constant . Although, the stable component set is evaluated at as in Eq.(2.3.2), the entire stability profile of each component from to is shown in all our stability plots. This way, we get additional graphical insight into how each component evolves along the path.

3.1 Case study with 1D Burgers equation and different sparsity-promoters

We show that stability selection can be combined with any sparsity-promoting penalized regression to learn PDE components from noisy and limited spatiotemporal data. We use simulated data of the 1D Burgers equation

(3.1.1)

with identical boundary and initial conditions as used in Rudy et al. (2017) to provide fair comparison between methods: periodic boundaries in space and the following Gaussian initial condition:

The simulation domain is divided uniformly into 256 Cartesian grid points in space and 1000 time points. The numerical solution is visualized in space-time in Figure 4. The numerical solution was obtained using parabolic method based on finite differences and time-stepping using explicit Euler method with step size .

Figure 2: Model selection with PDE-STRIDE for the 1D Burgers equation: The top row shows regularization paths (see Section 2.3) for three sparsity-promoting regression techniques: randomized LASSO, STRidge, and IHT-d all for the same design (, ). The inset at the bottom shows the colors correspondence with the dictionary. The ridge parameter for STRidge is fixed to Rudy et al. (2017). The value of for the randomized LASSO is set to . In all three cases, the standard threshold (red solid line, ( )) correctly identifies the true components. NOTE: The is set to for the randomized LASSO case for demonstrating stability selection.

We test the combinations of stability selection with the three sparsity-promoting regression techniques described in Section 2.2: randomized LASSO, STRidge, and IHT-d. The top row of Figure 2 shows the regularization paths for randomized LASSO, STRidge, and IHT-d. The bottom row of Figure 2 shows the corresponding stability profiles for each component in the dictionary. The colored solid lines correspond to the advection and diffusion terms of Eq.3.1.1. Thresholding the importance measure at = 0:8, sparsity-promoting regression methods are able to identify the correct components in the stability plots (solid colored lines) from the noise variables (dashed black lines).

3.2 Comparison between sparsity-promoting techniques

Although stability selection can be used in conjunction with any or sparsity-promoting regression method, the question arises whether a particular choice of regression algorithm is particularly well suited in the frame of PDE learning. We therefore perform a systematic comparison between LASSO, STRidge, and IHT-d for recovering the 1D Burgers equation under perturbations to the sample size , the dictionary size , and the noise level . An experiment for a particular triple is considered as success if there exists a (see section 2.3) for which the true PDE components are recovered. In Figure 3, the success frequencies over 30 independent repetitions with uniformly random data sub-samples are shown.


Figure 3: Comparison between different sparsity promoters for the 1D Burgers equation: Each colored square corresponds to a design with certain sample size , disctionary size , and nose level . Color indicates the success frequency over 30 independent repetitions with uniformly random data sub-samples. “Success” is defined as the existence of a for which the correct PDE is recovered from the data. The columns compare the three sparsity-promoting techniques: LASSO, STRidge, and IHT-d (left to right), as labelled at the top.

A first observation from Figure 3 is that solutions (here with STRidge and IHT-d) are better than the relaxed solutions (here with LASSO). We also observe that IHT-d performs better than LASSO and STRidge for large dictionary sizes , high noise, and small sample sizes. Large dictionaries with higher-order derivatives computed from discrete data cause grouping (correlations between variables), for which LASSO tends to select one variable from each group, ignoring the others Wang et al. (2011). Thus, LASSO fails to identify the true support consistently. STRidge shows good recovery for large dictionary sizes with clean data, but it breaks down in the presence of noise on the data. Of all three methods, IHT-d shows the best robustness to both noise and changes in the design. We note a decrease in inference power with increasing sample size , especially for large and high noise levels. This again can be attributed to correlations and groupings in the dictionary, which become more prominent with increasing sample size .

Based on these results, we use IHT-d as the sparsity-promoting regression method in conjunction with PDE-STRIDE for model selection in the remaining sections.

3.3 Stability-based model inference

We present benchmark results of PDE-STRIDE for PDE recovery with IHT-d as the sparse regression method. This combination of methods is used to recover PDEs from limited noisy data obtained by numerical solution of the 1D Burgers, 2D vorticity-transport, and 3D Gray-Scott equations. Once the support of the PDE model is learned by PDE-STRIDE with IHT-d, the actual coefficient values of the non-zero components are determined by solving the linear least-squares problem restricted to the recovered support . However, more sophisticated methods could be used for parameter estimation for a known structure of the PDE like in Raissi et al. (2017); Xun et al. (2013) from limited noisy data. But, this is beyond the scope of this paper given that in all cases considered the sample size significantly exceeds the cardinality of the recovered support () for which LLS fit provide good estimates of the PDE coefficients.

3.3.1 1D Burgers equation

We again consider the 1D Burgers equation from Eq. (3.1.1), using the same simulated data as in Section 3.1, to quantify the performance and robustness against noise of the PDE-STRIDE+IHT-d method. The results are shown in Figure 4 for a design with and . Even on this small data set, with a sample size comparable to dictionary size, our method recovers the correct model () with up to noise on the data, although the least-squares fits of the coefficient values gradually deviate from their exact values (see Table 1).

Figure 4: Model selection with PDE-STRIDE+IHT-d for 1D Burgers equation recovery : The top left image shows the numerical solution of the 1D Burgers equations on space and time grid. The stability plots for the design show the separation of the true PDE components (in solid color) from the noisy components (dotted black). The inference power of the PDE-STRIDE method is tested for additive Gaussian noise-levels up-to (not shown). In all the cases, perfect recovery was possible with the fixed threshold of on the importance measure (shown by the horizontal red solid line). The inset at the bottom shows the colors correspondence with the dictionary components.

For comparison, the corresponding stability plots for PDE-STRIDE+STRidge are shown in Supplementary Figure S4. When using STRidge regression, the algorithm creates many false positives, even at mild noise levels (%).

clean -1.0008 0.1000
-0.9971 0.1016
-0.9932 0.0997
-0.9842 0.0976
-0.9728 0.0984
-0.9619 0.0967
Table 1: Coefficients values of the recovered 1D Burgers equation for different noise levels.The stable components of the PDE inferred from plots in Figure 4 are .

3.3.2 2D vorticity transport equation

This section discusses results for the recovery of 2D vorticity transport equation using PDE-STRIDE. The vorticity transport equation can be obtained by taking curl of the Navier-Stokes equations and imposing the divergence-free constraint for enforcing in-compressibility, i.e. . This form of Navier-Stokes has found extensive applications in oceanography and climate modeling Temam (2001). For the numerical solution of the transport equation, we impose no-slip boundary condition at the left , right and bottom sides and shear flow boundary condition on the top side . The simulation code was written using openFPM framework Incardona et al. (2019) with explicit-time stepping on a space grid. The poisson problem was solved at every time-step to correct velocities to ensure divergence-free fields. The viscosity of the fluid simulated was set to . In Figure 5, we show a single time snapshot of the velocities and the vorticity field inside the square domain of the Lid-driven cavity experiment.

(3.3.1)

In Figure 6, the PDE-STRIDE results for 2D vorticity transport equation recovery are shown. The results demonstrate consistent recovery of the true support of the PDE for different noise-levels . The stable components recovered correspond to the true PDE components of the 2D vorticity equation Eq. 3.3.1. In table 2, we show refitted coefficients for the recovered PDE components. It should also be noted that the separation between the true (colored solid-lines) and the noisy (black dotted-lines) becomes less distinctive with increasing noise-levels. In the supplementary Figure S5, we also report the STRidge based stability selection results for the same design and stability selection parameters. It can be seen that STRidge struggles to recover the true support even at small noise-levels, i.e. .

Figure 5: Numerical solution of 2D Vorticity transport equation: The 2D domain with velocity components and vorticity is illustrated in the square domain. We choose to sample inside the rectangular box in the upper part of the domain to capture the rich dynamics resulting from shear boundary conditions imposed at the top surface. The black dots () denote the points at which the data is sampled.
clean 0.02504 0.02502 -0.9994 -1.0025
0.02501 0.02504 -0.9997 -1.0006
0.02492 0.0250 -1.0003 -0.9944
0.0247 0.0250 -1.004 -0.9841
0.0245 0.0251 -1.0091 -0.9748
0.0242 0.0251 -1.0083 -0.9586
Table 2: Coefficients of the recovered 2D Vorticity transport equation for different noise levels. The stable components of the PDE from Figure 6 are .
Figure 6: Model selection with PDE-STRIDE+IHT-d for 2D Vorticity transport equation recovery: The stability plots for the design show the separation of the true PDE components (in solid color) from the noisy components (dotted black). The inference power of the PDE-STRIDE method is tested for additive Gaussian noise-levels up-to (not shown). In all the cases, perfect recovery was possible with the fixed threshold of on the importance measure (shown by the horizontal red solid line). The inset at the bottom shows the colors correspondence with the dictionary components.

3.3.3 3D Gray-Scott equation

In this section, we report the recovery capabilities of PDE-STRIDE for the 3D Gray-Scott reaction-diffusion equation Eqs. 3.3.2. Reaction and diffusion of chemical species can produce a variety of patterns, reminiscent of those often observed in nature. Such processes form essential basis for morphogenesis in biology Meinhardt (1982) and may even be used to describe skin patterning and pigmentation Manukyan et al. (2017). We choose this example to show examples of systems with coupled variables and is very similar in dynamics to our real world example discussed in section 4. The reaction-diffusion dynamics described by Eqs. 3.3.2 are commonly be used to model the non-linear interactions between two chemical species .

(3.3.2a)
(3.3.2b)

We simulate the above equations using openFPM framework. The snapshot of the 3D cube simulation box along with the concentration is shown in Fig. 7. Finite-difference space discretization scheme was used to discretize the dynamics described in Eqs. 3.3.2 with grid spacing . The explicit time-stepping with step size was used for temporal integration to simulate for in real time. The 3D Gray-Scott model parameters used are , and .

Given the large degrees of freedom present in the 3D problem, for our learning problem we choose to sample data only from a small cube in the middle of the domain with dimension

. In Figure 7, we show PDE-STRIDE method correctly identifies the true PDE components for the dynamics of species given by Eq. (3.3.2a) for different noise levels. One can appreciate the clear separation between the true and noisy PDE components in the stability plots. We show results for different noise-levels between with as few as samples and for dictionary size . Similar plots for the inference of species dynamics are shown in Fig. S1. Although perfect recovery was not possible owing to the small diffusivity () of the species, consistent and stable recovery of the reaction terms (interaction terms) can be seen. The refitted coefficients for the recovered PDE for the both the species are reported in table 3 and table S1, respectively. The comparison plots for PDE-STRIDE with STRidge for the 3D Gray-Scott recovery are shown in the supplementary Figures S6, S7. We note that the STRidge is able to recover the complete form of Eq 3.3.2 in noise-free case for both the species, but it fails to recover both the PDEs in the noise case. The comparison clearly demonstrates that PDE-STRIDE+IHT-d clearly outperforms PDE-STRIDE+STRidge for inference from noisy data-sets.

1(0.014)
clean 0.0140 -0.0140 -1.0000
0.0142 -0.0143 -0.9915
0.0144 -0.0146 -0.9795
0.0150 -0.0153 -0.9843
Table 3: Coefficients of the recovered -component Gray-Scott reaction diffusion equation for different noise levels. The stable components of the PDE from Figure 7 are
Figure 7: Model selection with PDE-STRIDE+IHT-d for 3D Gray-Scott component equation recovery : The top left figure shows the visualization of the 3D simulation domain with species concentration. The color gradient corresponds to the varying concentration over space. The stability plots for the design show the separation of the true PDE components (solid color) from the noisy components (dotted black). The inference power of the PDE-STRIDE method is tested for additive Gaussian noise-levels up-to (not shown). In all the cases, perfect recovery was possible with the fixed threshold of on the importance measure (shown by the horizontal red solid line). The inset at the bottom shows the colors correspondence with the dictionary components.

In all the above presented cases, the learned stable components coincide with the true components of the underlying PDE model. The PDE-STRIDE framework is able to learn the stable components with data points as few as . So, we conclude that our PDE-STRIDE+IHT-d framework is able to robustly learn partial differential equations from limited noisy data. In the next section, we discuss the consistency and robustness of the PDE-STRIDE+IHT-d for perturbations in design parameters like sample-size , dictionary size , and noise-levels .

3.4 Achievability results

Achievability results are a compact way to check for the robustness and consistency of a model selection method for varying design parameters. They also provide an approximate means to reveal the sample complexity of any and sparsity-promoting technique, i.e. the number of data points required to recover the model with full probability. Specifically, given a sparsity promoting method, dictionary size , sparsity , and noise level , we are interested in how the sample size scales with with recovery probability converging to one. The study Wainwright (2009)

reports sharp phase transition from failure to success for Gaussian random designs with increasing sample size

for LASSO based sparsity solutions. The study also provides sufficient lower bounds for sample size as a function of for full recovery probability. In this section, we ask the question on whether sparse model selection with PDE-STRIDE (paired with IHT-d) exhibit similar sharp phase transition behaviour. Given the dictionary components in our case are compiled from derivatives and non-linearities computed from noisy-data, it is interesting to observe if full recovery probability is achieved and maintained with increasing sample size(). In the particular context of PDE learning, increasing dictionary size by including higher order non-linearities and higher order derivatives has the potential to include strongly correlated components, which can negatively impact the inference power. This observation was made evident with results and discussion from section 3.2.

In Figures (8, 9) the achievability results for both the 1D Burgers system and 3D -component Gray-Scott reaction diffusion system are shown. Each point in Figure (8, 9) correspond to 20 repetitions of an experiment with some design under random data sub-sampling. An experiment with a design is considered as success if for which the true PDE support is recovered with PDE-STRIDE+IHT-d by thresholding at . In Figures (8, 9), we see strong consistency and robustness to design parameters for both Burgers and Gray-Scott systems. And, we also observe sharp phase transition from failure to success with recovery probability converging to one with increasing sample size . This amply evidence suggest that PDE-STRIDE not only enhances the inference power of the IHT-d method but also ensures consistency. In addition, the sharp phase transition behaviour also point towards a strict lower bound on the sample complexity (N) below which full recoverability is not attainable as studied in Wainwright (2009). Following this, achievability plots can be used to read-out approximate estimates of the sample-complexity of the learned dynamical systems. In the case of Burgers, success probability is achieved with data points as few as in noise-free and points in noisy cases for different designs (). For the 3D Gray-scott system, success probability is achieved with data points as few as in noise-free and points in noisy cases for different designs ().

Figure 8: Achievability results for model selection with PDE-STRIDE+IHT-d for 1D Burgers equation recovery : The achievability results for 1D Burgers equation recovery with PDE-STRIDE is shown for varying designs . Every point on the plot corresponds to 20 repetitions of the PDE-STRIDE method for some design under random data sub-sampling. Each line with markers corresponds to a dictionary size . In Burgers case, we test for clearly shown in the inset below the plots. The colored area around the line show the associated variance of the Bernoulli’s trials.
Figure 9: Achievability results for model selection with PDE-STRIDE+IHT-d for 3D Gray-Scott -component equation recovery : The achievability results for 3D Gray-Scott equation recovery with PDE-STRIDE method is shown for varying designs . Every point on the plot corresponds to 20 repetitions of the PDE-STRIDE for some design under random data sub-sampling. Each line with markers corresponds to a dictionary size . In 3D Gray-Scott case, we test for clearly shown in the inset below the plots. The colored area around the line show the associated variance of the Bernoulli’s trials.
Figure 10: Data-driven model inference of the regulatory network of membrane PAR proteins from spatiotemporal data acquired from C. elegans zygote : A: spatiotemporal PAR concentration data-sets provided by Grill Lab Etemad-Moghadam et al. (1995); Goehring et al. (2011) at MPI-CBG. Manually defining observational boundaries (white ellipses) and identification of key variables of interest like protein concentration and cell-cortex velocity in the microscopy data. The blue color and red color corresponds to the intensities of the posterior PAR (pPAR) and anterior PAR (aPAR) proteins, respectively. B: The noisy aPAR(red) and pPAR(blue) concentration profiles extracted from the experiments from fluorescence microscopy. C

: De-noised spatiotemporal concentration profiles obtained from extracting the principle mode of the Singular value decomposition (SVD) of the noisy flow data. Different symbol lines correspond to different time instances shown in the bottom inset of Figure A.

D: De-noised spatiotemporal cortical flow profiles obtained from extracting the principle mode of the Singular value decomposition (SVD) of the noisy data. E: The stability plots using PDE-STRIDE+IHT-d to identify the stable PDE components (colored) and learn the model for aPAR protein interaction. F: The stability plots using PDE-STRIDE+IHT-d to identify the stable PDE components (colored) and learn the model for pPAR protein interaction. G: Achievability results to test the robustness and consistency of the inferred model of both aPAR - and pPAR - with increasing sample-size . H: The simulation results (   ) of the learned models for aPAR protein overlapped with the experimental data () at different times. I: The simulation results (   ) of the learned model for pPAR protein overlapped with the experimental data () at different times.

4 Data-driven PDE inference on real experimental data to explain C. elegans zygote patterning

We showcase the applicability of the PDE-STRIDE with IHT-d to real experimental data. We use microscopy images to infer a PDE model that explains early C. elegans embryo patterning, and we use it to confirm a previous hypothesis about the physics driving this biological process. Earlier studies of this process proposed a mechano-chemical mechanism for PAR protein polarization on the cell membrane Etemad-Moghadam et al. (1995); Goehring et al. (2011); Gross et al. (2019). They systematically showed that the cortical flows provide sufficient perturbations to trigger polarization Goehring et al. (2011). The experiments conducted in Goehring et al. (2011) measured the concentration of the anterior PAR complex (aPAR), the concentration of posterior PAR complex (pPAR) and the cortical flow field (v) as a function of time as shown in Figure 10, A, B, D. The concentration and velocity fields were acquired on a grid with resolution of in space and time. These experimental data-sets were used to validate the mechano-chemical model developed in the studies Etemad-Moghadam et al. (1995); Goehring et al. (2011). Here, we challenge the PDE-STRIDE+IHT-d framework to learn a PDE model for the regulatory network (Eq 4.0.1) of the interacting membrane PAR proteins in a pure data-driven sense from the experimental data-set. Given the noisy nature of the data-sets, our analysis is limited to the first SVD mode of the data as shown in Figure 10 C. We also focus our attention on the temporal regime post the advection trigger when the early domains of PAR proteins are already formed. The PDE-STRIDE is then directed to learn an interpretable model from the data, that evolves the early protein domains to fully developed patterns as shown in Figure 10A.

The reaction kinetics of the PAR proteins can be formulated as,

(4.0.1)

Here, and are the reactant and product stoichiometry, respectively. The variables and correspond to the aPAR and pPAR protein species and is the reaction rate.

In designing the dictionary , the maximum allowed stoichiometry for reactant and product is restricted to 2, i.e. . The PDE-STRIDE+IHT-d results for the learned regulatory reaction network from data are shown in Figure 10 E, F. The stable components of the model for aPAR protein are and for pPAR protein are for a design . In Figure 10G, achievability tests are conducted to show the consistency and robustness of the learned models across different sample-sizes . The learned model achieves full recovery probability for sample-size . Our preliminary models inferred in a data-driven manner exhibit very good qualitative agreement with the experiments and also recapitulates the mutual inhibitory nature of the PAR protein. The parameters of the learned models are then computed by least-squares refitting and are tabulated in Table 4.

pPAR
-0.00019769 0.01073594 -0.00027887
aPAR
0.0041325 -0.00216077 -0.00014699
Table 4: Coefficients values of the inferred PAR model. The stable components of the PDE inferred from stability results in Figure 10 E, 10 F are and .

In the Figure 10H, 10I, we overlay the numerical solution of the the learned model with the de-noised experimental data at certain time snapshots for both models of aPAR and pPAR proteins for quantitative comparison. This very simple PDE model is able to describe the temporal evolution of the PAR protein domains on the C. elegans zygote. Although, there is a good match in the spatial-scales (PAR domain sizes) for the two proteins, there exists a non-negligible discrepancy between the simulation and experiments in the time-scales for the pPAR evolution. This difference can be attributed to the advection processes dictated by the cortical flows, which are not included in our simple ODE type model as shown in 10E, 10I. We believe including higher modes of the SVD decomposition and also using structured sparsity for enforcing symmetric arguments through grouping Ward Jr (1963) can further mature our data-driven models to include the mechanical aspects of the PAR system. In supplementary Figure S8(left), we already show for a particular design of , the advection and diffusion components of the aPAR protein model carry significant importance measure to be included in the stable set , but this is not the case with the pPAR components as shown in Figure S8(right). The preferential advective displacement of the aPARs to the anterior side modeled by the advective term is also in line with the observations of the experimental studies Goehring et al. (2011). However, such models with advection and diffusion components exhibit inconsistency for varying sample-size , in contrast to our simple ODE type model as illustrated in Figure 10G.

5 Conclusion and Discussion

We have addressed two key issues that have thus far limited the application of sparse regression methods for automated PDE inference from noisy and limited data: the need for manual parameter tuning and the high sensitivity to noise in the data. We have shown that stability selection combined with any sparsity-promoting regression technique provides an appropriate level of regularization for consistent and robust recovery of the correct PDE model. Our numerical benchmarks suggested that iterative hard thresholding with de-biasing (IHT-d) is an ideal combination with stability selection to form a robust and parameter-free framework (PDE-STRIDE) for PDE learning. This combination of methods outperformed all other tested algorithmic approaches with respect to identification performance, amount of data required, and robustness to noise. The resulting stability-based PDE-STRIDE method was tested for robust recovery of the 1D Burgers equation, 2D vorticity transport equation, and 3D Gray-Scott reaction-diffusion equations from simulation data corrupted with up to 6 of additive Gaussian noise. The achievability studies demonstrated the consistency and robustness of the PDE-STRIDE method for full recovery probability of the model with increasing sample size and for varying dictionary size and noise levels . In addition, we note that achievability plots provide a natural estimate for the sample-complexity of the underlying non-linear dynamical system. However, this empirical estimate of the sample-complexity depends on the choice of model selection algorithm and how the data is sampled.

We demonstrated the capabilities of the PDE-STRIDE+IHT-d framework by applying it to learn a PDE model of embryo polarization directly from fluorescence microscopy images of C. elegans zygotes. The model recovered the regulatory reaction network of the involved proteins and their spatial transport dynamics in a pure data-driven manner, with no knowledge used about the underlying physics or symmetries. The thus learned, data-derived PDE model was able to correctly predict the spatiotemporal dynamics of the embryonic polarity system from the early spatial domains to the fully developed patterns as observed in the polarized C. elegans zygote. The model we inferred from image data using our method confirms both the structure and the mechanisms of popular physics-derived cell polarity models. Interestingly, the mutually inhibitory interactions between the involved protein species, which have previously been discovered by extensive biochemical experimentation, were automatically extracted from the data.

Besides rendering sparse inference methods more robust to noise and parameter-free, stability selection has the important conceptual benefit of also providing interpretable probabilistic importance measures for all model components. This enables modelers to construct their models with high fidelity, and to gain an intuition about correlations and sensitivities. Graphical inspection of stability paths provides additional freedom for user intervention in semi-automated model discovery from data.

We expect that statistical learning methods have the potential to enable robust, consistent, and reproducible discovery of predictive and interpretable models directly from observational data. Our parameter-free PDE-STRIDE framework provides a first step toward this, but many open issues remain. First, numerically approximating time and space derivatives in the noisy data is a challenge for noise levels higher than a few percent. This limits the noise robustness of the overall methods, regardless of how robust the subsequent regression method is. The impact of noise becomes even more severe when exploring models with higher-order derivatives or stronger non-linearities. Future work should focus on formulations that are robust to the choice of different discretization methods, while providing the necessary freedom to impose structures on the coefficients. Second, a principled way to constrain the learning process by physical priors, such as conservation laws and symmetries, is lacking. Exploiting such structural knowledge about the dynamical system is expected to greatly improve learning performance. It should therefore be explored whether, e.g., structured sparsity or grouping constraints Kutz et al. (2017); Ward Jr (1963) can be adopted for this purpose. Especially in coupled systems, like the Gray-Scott reaction-diffusion system and the PAR-polarity model, known symmetries could be enforced through structured grouping constraints.

In summary, we believe that data-driven model discovery has tremendous potential to provide novel insights into complex processes, in particular in biology. It provides an effective and complementary alternative to hypothesis-driven approaches. We hope that the stability-based model selection method PDE-STRIDE presented here is going to contribute to the further development and adoption of these approaches in the sciences.

Code and data availability: The github repository for the codes and data can be found at https://github.com/SuryanarayanaMK/PDE-STRIDE.git.

Acknowledgements

This work was in parts supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2068 – 390729961. We are grateful to the Grill lab at MPI-CBG/TU Dresden for providing the spatiotemporal PAR concentration and flow field data and allowing us to use them in our showcase. We thank Nathan Kutz (University of Washington) and his group for making their code and data public.

References

  • Mogilner et al. [2006] Alex Mogilner, Roy Wollman, and Wallace F Marshall. Quantitative modeling in cell biology: what is it good for? Developmental cell, 11(3):279–287, 2006.
  • Sbalzarini [2013] Ivo F Sbalzarini. Modeling and simulation of biological systems from image data. Bioessays, 35(5):482–490, 2013.
  • Tomlin and Axelrod [2007] Claire J Tomlin and Jeffrey D Axelrod. Biology by numbers: mathematical modelling in developmental biology. Nature reviews genetics, 8(5):331, 2007.
  • Duffy [2013] Daniel J Duffy. Finite Difference methods in financial engineering: a Partial Differential Equation approach. John Wiley & Sons, 2013.
  • Adomian [1995] George Adomian. Solving the mathematical models of neurosciences and medicine. Mathematics and computers in simulation, 40(1-2):107–114, 1995.
  • Donoho [2017] David Donoho. 50 years of data science. Journal of Computational and Graphical Statistics, 26(4):745–766, 2017.
  • Barnes et al. [2011] Chris P Barnes, Daniel Silk, Xia Sheng, and Michael P H Stumpf. Bayesian design of synthetic biological systems. Proceedings of the National Academy of Sciences of the United States of America, 108(37):15190–15195, 2011. ISSN 0027-8424. doi: 10.1073/pnas.1017972108.
  • Asmus et al. [2017] Josefine Asmus, Christian L Müller, and Ivo F Sbalzarini. L p-Adaptation: Simultaneous Design Centering and Robustness Estimation of Electronic and Biological Systems. Scientific Reports, 7(1):6660, 2017. ISSN 20452322. doi: 10.1038/s41598-017-03556-5.
  • Kitano [2002] Hiroaki Kitano. Computational systems biology. Nature, 420(6912):206, 2002.
  • Gregor et al. [2005] Thomas Gregor, William Bialek, Rob R de Ruyter van Steveninck, David W Tank, and Eric F Wieschaus. Diffusion and scaling during early embryonic pattern formation. Proceedings of the National Academy of Sciences, 102(51):18403–18407, 2005.
  • Chen et al. [1999] Ting Chen, Hongyu L He, and George M Church. Modeling gene expression with differential equations. In Biocomputing’99, pages 29–40. World Scientific, 1999.
  • Ay and Arnosti [2011] Ahmet Ay and David N Arnosti. Mathematical modeling of gene expression: a guide for the perplexed biologist. Critical reviews in biochemistry and molecular biology, 46(2):137–151, 2011.
  • Prost et al. [2015] Jacques Prost, Frank Jülicher, and Jean-François Joanny. Active gel physics. Nature Physics, 11(2):111, 2015.
  • Münster et al. [2019] Stefan Münster, Akanksha Jain, Alexander Mietke, Anastasios Pavlopoulos, Stephan W Grill, and Pavel Tomancak. Attachment of the blastoderm to the vitelline envelope affects gastrulation of insects. Nature, page 1, 2019.
  • Voss et al. [1998] H Voss, MJ Bünner, and Markus Abel. Identification of continuous, spatiotemporal systems. Physical Review E, 57(3):2820, 1998.
  • Breiman and Friedman [1985] Leo Breiman and Jerome H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 1985. ISSN 1537274X. doi: 10.1080/01621459.1985.10478157.
  • Bär et al. [1999] Markus Bär, Rainer Hegger, and Holger Kantz. Fitting partial differential equations to space-time dynamics. Physical Review E, 59(1):337, 1999.
  • Xun et al. [2013] Xiaolei Xun, Jiguo Cao, Bani Mallick, Arnab Maity, and Raymond J. Carroll. Parameter estimation of partial differential equation models. Journal of the American Statistical Association, 108(503):1009–1020, 2013. ISSN 01621459. doi: 10.1080/01621459.2013.794730.
  • Raissi et al. [2017] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Machine learning of linear differential equations using gaussian processes. Journal of Computational Physics, 348:683–693, 2017.
  • Brunton et al. [2016] Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, page 201517384, 2016.
  • Rudy et al. [2017] Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven discovery of partial differential equations. Science Advances, 3(4):e1602614, 2017.
  • Schaeffer [2017] Hayden Schaeffer. Learning partial differential equations via data discovery and sparse optimization. Proc. R. Soc. A, 473(2197):20160446, 2017.
  • Zhang and Lin [2018] Sheng Zhang and Guang Lin. Robust data-driven discovery of governing physical laws with error bars. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, 474(2217):20180305, 2018. ISSN 1364-5021. doi: 10.1098/rspa.2018.0305. URL http://rspa.royalsocietypublishing.org/lookup/doi/10.1098/rspa.2018.0305.
  • Mangan et al. [2019] N M Mangan, T Askham, S L Brunton, J N Kutz, and J L Proctor. Model selection for hybrid dynamical systems via sparse regression. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 475(2223):1–22, 2019. ISSN 14712946. doi: 10.1098/rspa.2018.0534.
  • Long et al. [2017] Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-Net: Learning PDEs from data. arXiv preprint arXiv:1710.09668, 2017.
  • Raissi and Karniadakis [2018] Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357:125–141, 2018.
  • Raissi et al. [2019] M Raissi, P Perdikaris, and G E Karniadakis.

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.

    Journal of Computational Physics, 378:686–707, 2019. ISSN 10902716. doi: 10.1016/j.jcp.2018.10.045. URL https://doi.org/10.1016/j.jcp.2018.10.045.
  • Long et al. [2018] Zichao Long, Yiping Lu, and Bin Dong. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. arXiv preprint arXiv:1812.04426, 2018.
  • Dong et al. [2017] Bin Dong, Qingtang Jiang, and Zuowei Shen. Image restoration: Wavelet frame shrinkage, nonlinear evolution pdes, and beyond. Multiscale Modeling & Simulation, 15(1):606–660, 2017.
  • Meinshausen and Bühlmann [2010] Nicolai Meinshausen and Peter Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417–473, 2010.
  • Shah and Samworth [2013] Rajen D Shah and Richard J Samworth. Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1):55–80, 2013.
  • Tibshirani [1996] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
  • Blumensath and Davies [2008] Thomas Blumensath and Mike E Davies. Iterative thresholding for sparse approximations. Journal of Fourier analysis and Applications, 14(5-6):629–654, 2008.
  • Foucart [2011] Simon Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
  • Gross et al. [2019] Peter Gross, K Vijay Kumar, Nathan W Goehring, Justin S Bois, Carsten Hoege, Frank Jülicher, and Stephan W Grill. Guiding self-organized pattern formation in cell polarity establishment. Nature Physics, 15(3):293, 2019.
  • Chartrand [2011] Rick Chartrand. Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics, 2011, 2011.
  • Stickel [2010] Jonathan J Stickel. Data smoothing and numerical differentiation by a regularization method. Computers & chemical engineering, 34(4):467–475, 2010.
  • Wu and Lange [2008] Tong Tong Wu and Kenneth Lange. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2(1):224–244, 2008. ISSN 19326157. doi: 10.1214/07-AOAS147.
  • Friedman et al. [2010] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.
  • Eckstein and Bertsekas [1992] Jonathan Eckstein and Dimitri P Bertsekas. On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1-3):293–318, 1992.
  • Combettes and Pesquet [2011] Patrick L Combettes and Jean-Christophe Pesquet. Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, pages 185–212. Springer, 2011.
  • Beck and Teboulle [2009] Amir Beck and Marc Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm. Society for Industrial and Applied Mathematics Journal on Imaging Sciences, 2(1):183–202, 2009. ISSN 1936-4954. doi: 10.1137/080716542.
  • Meinshausen et al. [2006] Nicolai Meinshausen, Peter Bühlmann, et al. High-dimensional graphs and variable selection with the lasso. The annals of statistics, 34(3):1436–1462, 2006.
  • Zhao and Yu [2006] Peng Zhao and Bin Yu. On model selection consistency of lasso. Journal of Machine learning research, 7(Nov):2541–2563, 2006.
  • Fan and Li [2001] Jianqing Fan and Runze Li. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456):1348–1360, 2001. ISSN 0162-1459. doi: 10.1198/016214501753382273.
  • Zhang [2010] Cun Hui Zhang. Nearly unbiased variable selection under minimax concave penalty, volume 38. 2010. ISBN 9040210063. doi: 10.1214/09-AOS729.
  • Tropp [2004] Joel A Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information theory, 50(10):2231–2242, 2004.
  • Needell and Tropp [2009] Deanna Needell and Joel A Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and computational harmonic analysis, 26(3):301–321, 2009.
  • Dai and Milenkovic [2009] Wei Dai and Olgica Milenkovic. Subspace pursuit for compressive sensing signal reconstruction. IEEE transactions on Information Theory, 55(5):2230–2249, 2009.
  • Blumensath and Davies [2009] Thomas Blumensath and Mike E Davies. Iterative hard thresholding for compressed sensing. Applied and computational harmonic analysis, 27(3):265–274, 2009.
  • Herrity et al. [2006] Kyle K Herrity, Anna C Gilbert, and Joel A Tropp. Sparse approximation via iterative thresholding. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 3, pages III–III. IEEE, 2006.
  • Tropp [2006] Joel A Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory, 52(3):1030–1051, 2006.
  • Figueiredo et al. [2007] Mário AT Figueiredo, Robert D Nowak, and Stephen J Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of selected topics in signal processing, 1(4):586–597, 2007.
  • Kohavi [1995] Ron Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.

    International Joint Conference of Artificial Intelligence

    , 1995.
  • Schwarz [1978] Gideon Schwarz. Estimating the Dimension of a Model. The Annals of Statistics, 1978. ISSN 0090-5364. doi: 10.1214/aos/1176344136.
  • Lederer and Müller [2015] Johannes Lederer and Christian L. Müller. Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI 2015), pages 2729—-2735. AAAI Press, 2015.
  • Bien et al. [2018] Jacob Bien, Irina Gaynanova, Johannes Lederer, and Christian L. Müller. Non-Convex Global Minimization and False Discovery Rate Control for the TREX. Journal of Computational and Graphical Statistics, 27(1):23–33, 2018. ISSN 15372715. doi: 10.1080/10618600.2017.1341414. URL http://arxiv.org/abs/1604.06815.
  • Yu [2013] Bin Yu. Stability. Bernoulli, 19(4):1484–1500, 2013. ISSN 1350-7265. doi: 10.3150/13-BEJSP14. URL http://projecteuclid.org/euclid.bj/1377612862.
  • Liu et al. [2010] Han Liu, Kathryn Roeder, and Larry Wasserman. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Advances in neural information processing systems, 24(2):1432–1440, 2010. ISSN 1049-5258. URL https://papers.nips.cc/paper/3966-stability-approach-to-regularization-selection-stars-for-high-dimensional-graphical-models.pdfhttp://www.ncbi.nlm.nih.gov/pubmed/25152607{%}0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4138724.
  • Bühlmann et al. [2014] Peter Bühlmann, Markus Kalisch, and Lukas Meier. High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Application, 1:255–278, 2014.
  • Wang et al. [2011] Sijian Wang, Bin Nan, Saharon Rosset, and Ji Zhu. Random lasso. The annals of applied statistics, 5(1):468, 2011.
  • Temam [2001] Roger Temam. Navier-Stokes equations: theory and numerical analysis, volume 343. American Mathematical Soc., 2001.
  • Incardona et al. [2019] Pietro Incardona, Antonio Leo, Yaroslav Zaluzhnyi, Rajesh Ramaswamy, and Ivo F Sbalzarini. OpenFPM: A scalable open framework for particle and particle-mesh codes on parallel computers. Computer Physics Communications, 2019.
  • Meinhardt [1982] Hans Meinhardt. Models of biological pattern formation. New York, 1982.
  • Manukyan et al. [2017] Liana Manukyan, Sophie A Montandon, Anamarija Fofonjka, Stanislav Smirnov, and Michel C Milinkovitch. A living mesoscopic cellular automaton made of skin scales. Nature, 544(7649):173, 2017.
  • Wainwright [2009] Martin J Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using - constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183–2202, 2009.
  • Etemad-Moghadam et al. [1995] Bijan Etemad-Moghadam, Su Guo, and Kenneth J Kemphues. Asymmetrically distributed PAR-3 protein contributes to cell polarity and spindle alignment in early C. elegans embryos. Cell, 83(5):743–752, 1995.
  • Goehring et al. [2011] Nathan W Goehring, Philipp Khuc Trong, Justin S Bois, Debanjan Chowdhury, Ernesto M Nicola, Anthony A Hyman, and Stephan W Grill. Polarization of PAR proteins by advective triggering of a pattern-forming system. Science, 334(6059):1137–1141, 2011.
  • Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963.
  • Kutz et al. [2017] J Nathan Kutz, Samuel H Rudy, Alessandro Alla, and Steven L Brunton. Data-driven discovery of governing physical laws and their parametric dependencies in engineering, physics and biology. In 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 1–5. IEEE, 2017.
  • Donoho and Gavish [2013] David L Donoho and Matan Gavish. The optimal hard threshold for singular values is 4/√ 3, 2013.
  • Hansen [1990] Per Christian Hansen. Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank. SIAM Journal on Scientific and Statistical Computing, 11(3):503–518, 1990.

6 Supplementary Material

6.1 Algorithm

Problem:
IHD-d:
Initialize:
for   do
      
      
      for  do
            
            
            if   then
                 
            end if
      end for
      
end for
return
IHT:
Initialize:
for   do
      
      
end for
 
Problem:
HTP:
Initialize:
for   do
      
      
      
      
end for
Algorithm 1

In the above algorithm and correspond to the step-size/learning rates corresponding to the IHT and de-biasing steps respectively. The learning rate

is computed as the inverse of the Lipschitz constant of the gradient of the square-loss function

in Eq.(2.1.5), i.e. . In the similar fashion, the learning rate in the de-biasing step is computed as . Here, is the Lipschitz constant of the square-loss function restricted to the support set .

Figure S1: Model selection with PDE-STRIDE+STRidge for 3D Gray-scott component equation recovery : The stability plots for the design show the separation of the true PDE components (in solid color) from the noisy components. The inference power of the PDE-STRIDE method is tested for additive Gaussian noise-levels up-to (not shown). The PDE-STRIDE fails to identify the true support of the -component PDE given the small diffusion coefficients associated with the unidentified diffusive components. However, the plots show consistency for the reaction terms upto additive Gaussian noise-levels. The inset at the bottom shows the colors correspondence with the PDE components.
1
clean 0 -0.0669 0.9999
0 -0.0666 0.9910
0.0001 -0.0667 0.9840
0.0001 -0.0657 0.9677
Table S1: Coefficients of the recovered -component Gray-Scott reaction diffusion equation for different noise levels. The stable components of the PDE from Figure S1 are

.

6.2 Denoising technique

For all the data-sets considered both from experiments and simulation, we use Singular Value decomposition (SVD) for de-noising. De-noising is achieved by identifying the “elbows” in singular values plots and applying a hardthreshold at the elbow to filter the noise [71, 72].

Figure S2: Truncated singular value decomposition (SVD) for denoising for 1D Burgers data-set
Figure S3: Truncated singular value decomposition (SVD) for denoising for 1D Burgers data-set
Figure S4: Model selection with PDE-STRIDE+STRidge for 1D Burgers equation inference : The plots show the identification of the true PDE components (in solid color) from the noisy components. The Stability selection parameters are with repetitions. The statistical power of the algorithm is tested for additive Gaussian noise-levels upto . All in the cases, perfect recovery was possible with the fixed threshold of shown by the red solid line. The inset at the bottom shows the colors correspondence with the respective PDE components.
Figure S5: Model selection with PDE-STRIDE+STRidge for 2D Vorticity transport equation inference : The plots show the identification of the true PDE components (in solid color) from the noisy components. The Stability selection parameters are with repetitions. The statistical power of the algorithm is tested for additive Gaussian noise-levels upto . All in the cases, perfect recovery was possible with the fixed threshold of shown by the dark blue solid line. The inset at the bottom shows the colors correspondence with the respective PDE components.
Figure S6: Model selection with PDE-STRIDE+STRidge for inference of -component of the Gray-Scott reaction diffusion equation : The plots show the identification of the true PDE components (in solid color) from the noisy components. The Stability selection parameters are with repetitions. The statistical power of the algorithm is tested for additive Gaussian noise-levels upto . All in the cases, perfect recovery was possible with the fixed threshold of shown by the dark blue solid line. The inset at the bottom shows the colors correspondence with the respective PDE components.
Figure S7: Model selection with PDE-STRIDE+STRidge for inference of -component of the Gray-Scott reaction diffusion equation : The plots show the identification of the true PDE components (in solid color) from the noisy components. The Stability selection parameters are with repetitions. The statistical power of the algorithm is tested for additive Gaussian noise-levels upto . All in the cases, perfect recovery was possible with the fixed threshold of shown by the dark blue solid line. The inset at the bottom shows the colors correspondence with the respective PDE components.
Figure S8: Model selection with PDE-STRIDE+IHT-d for PAR model inference : The stability results for the design for both aPAR (left) and pPAR (right) are shown. In the stability set for aPAR , we see advection dominant term also appearing. The diffusion term is also seen very close to the threshold .