## 1 Introduction

In recent years, pioneering research have been conducted into the application of machine learning to computational physics and engineering contexts: example works include RAISSI2017683; RAISSI2018125; RAISSI2019686; Rudye1602614; Brunton3932; Lusch_2018. As a result, new sub-fields within the computational sciences have emerged, including, but not limited to physics-informed machine learning nature_karniadakis and scientific machine learning (Sci-ML) osti_1478744. Within these sub-fields, the development of machine learning based methods to infer the parameters of dynamic system governing equations, and/or the discovery of new partial differential equations (PDE) has attracted significant attention and important early success. In such early work, different inference strategies have been proposed. A popular method, known as SINDy Brunton3932

has its foundations in building a dataset of spatial derivatives that are potentially involved in the governing equation of the observed physics, in order to perform a sparse linear regression aimed at estimating the coefficients of each derivative term in some latent PDE. This seminal work was later extended in

Rudye1602614, where the PDE derivative terms were computed either through polynomial interpolations or finite differences, and in

Xu_2021; XU2021110592; Both_2021 using neural networks. A major advantage of these methods is the interpretability: the spatial derivatives involved in the PDE and their coefficients are discovered explicitly. Other methods directly approximate the differential operator using a physics-informed neural network (PINN) RAISSI2019686. While these methods are generally highly accurate, they lack interpretability JMLR:v19:18-046 or only allow for recovering the coefficients of a PDE with the functional form already known RAISSI2019686.The sparse regression based methods outlined earlier have been providing promising results, but the regression system is often poorly conditioned and the spatial derivatives estimations must be highly accurate to provide satisfying results. In Xu_2021; XU2021110592; Both_2021, measured data from a physical system are interpolated using a neural network, and then differentiated to create the spatial derivative dataset. This requires abundant data with as little noise as possible. In practical engineering and scientific applications however, acquiring enough data (which is typically experimental measurements) may be a very expensive and time consuming process. While deep and dense neural networks are able to capture complex physics described in snapshots of physical system dynamics (e.g. shocks or sharp gradients), they are also more likely to overfit noisy measurements. Unfortunately, methods for fine tuning the networks parameters (e.g. cross validation) may be inapplicable due to the lack of abundant data. Consequently, there is a trade-off: Use more data for training and potentially obtain better derivative estimations, but with a significant risk of overfitting, or use more data for testing and limit overfitting, but with a greater risk of missing complex underlying physics.

Bayesian methods, in particular Bayesian neural network (BNNs) 10.5555/525544; 10.5555/1162264; Goan_2020 are promising for avoiding this trade-off, as they a naturally less prone to overfitting, even with very noisy and sparse data. Furthermore, BNNs provide confidence intervals on their prediction, which can be used for improving the accuracy of PDE recovery techniques. In the field of PDE recovery, Bayesian machine learning methods have been used to infer parameters of governing equations with known functionnal form. For example, RAISSI2017683 used Gaussian processes, and Yang_2021; MENG2021110361 used BNNs combined with PINNs to infer PDE parameters, but do not permit the discovery of unknown PDEs. In this paper, we extend the original approach outlined in Brunton3932; Rudye1602614; Xu_2021; Both_2021 in two manners. First, we propose to use a BNN to interpolate the snapshot measurements from the physical system. Relying on BNNs offers two major advantages: It makes the model more robust to noise and sparse data (allowing for minimal tuning of the model’s parameters) and provides valuable confidence intervals over the interpolation predictions. Secondly, we propose to use the BNN confidence intervals to quantify the uncertainty over the spatial derivative dataset, and introduce this uncertainty into a sparse Bayesian linear regression model to recover the PDE coefficients.

## 2 Bayesian Neural Network

In a BNN, the weights w

are assumed to be sampled from a prior probability distribution, here taken as a Gaussian with a 0 mean and unit standard deviation:

(1) |

The posterior over the weights is:

(2) |

Due to the highly non-linear nature of a neural network, the likelihood is a very complex function and sampling from the posterior is analytically intractable. We can either approximate the posterior with e.g. a Gaussian distribution and find the parameters of the Gaussian by minimizing the KL divergence (variational inference)

NIPS2011_7eb3c8be; Goan_2020; jospin2020handson; Blei_2017, or use Monte-Carlo sampling 10.5555/525544; cobb2020scaling; jospin2020handson; Goan_2020. Variational inference for BNNs is generally inefficient and often fails to provide meaningful confidence intervals Goan_2020; Blei_2017; pmlr-vR5-wang05a; turner_sahani_2011. Here, we rely on Hamiltonian Monte-Carlo (HMC) instead.The intuition behind HMC is to flip the posterior probability distribution up-side-down and assimilate the regions of high probability as pits. If a virtual particle is placed on the flipped distribution and starts rolling freely, it will naturally go towards the regions of lower potential energy (pits), i.e. regions with higher probability. HMC has two steps. First it simulates the motion of such particle using Hamiltonian physics and records the position over time, which provides a set of sample candidates, and then uses Metropolis Hastings to either accept or reject these samples. The Hamiltonian step generally provides suitable samples and allows for a higher acceptance rate than other MCMC methods

10.5555/525544.## 3 Inversion Framework for PDE Recovery and Discovery

We consider measurements from a given physical phenomenon of the form inputs-outputs , where the input is a time-space coordinate and the output is a corresponding measured physical quantity (potentially noisy). All the data is generated from numerical solutions of PDEs using either a standard order Runge-Kutta (heat equation, section 4.1), or Chebyshev polynomials with Runge-Kutta time integration (Burgers equation and Korteweg-De Vries equation, section 4.2 and 4.3) using the Chebfun matlab package Driscoll2014.

Unless specified otherwise, we assume 16 measurement sensors placed at random space locations that record the physical quantity of interest over time with stepping . The input data has the form where is time and is the sensor Cartesian coordinate, and the output is , where is the PDE solution and is noise corrupting the data, we consider three cases: (noiseless), and .

The general inversion framework for PDE recovery is outlined in the following subsections.

### 3.1 Bayesian Neural Network Fitting

We first fit the measurement data using a BNN . In the following sections, the BNN indifferently takes the same architecture: 4 fully connected hidden layers of 50 hidden units each, with activation functions. Unless specified otherwise, the prior over each neural network weight is a standard Gaussian distribution (zero mean and unit standard deviation). The noise standard deviation in the likelihood function is taken as . In the HMC sampling we take samples with a time stepping .

### 3.2 Surrogate Derivative Dataset

Once the BNN is properly trained, we sample a set of time-space coordinates randomly: where is the time-space domain and

the uniform distribution. The BNN is differentiated multiple times with respect to space and time and evaluated at each input

. The derivatives are then averaged over the set of weight samples. For example, for the time derivative we have:(3) |

Using this approach, we build a surrogate dataset of successive derivatives in space potentially comprised in the underlying governing PDE. The spatial derivative orders included in is arbitrary and we may include non-linear terms as well (the total number of derivative candidates is defined as

). Similarly, we compute the corresponding time derivative output vector

. We also compute the variance of each spatial derivatives and sum them up to quantify the uncertainty over each surrogate derivatives (covariance matrix ).(4) |

(5) |

(6) |

(7) |

(8) |

### 3.3 Bayesian Linear Regression Fitting

The surrogate dataset of derivatives is fitted with a Bayesian linear regression (BLR) in order to approximate the coefficients associated with each derivative. This is a sparse regression problem and it is expected that the set of actual derivatives involved in the ground true PDE should be much smaller than the entire set of surrogate derivatives. Hence most coefficients should be 0. Therefore, a BLR with Gaussian priors should be well suited as it will naturally skew most coefficients towards 0. Furthermore, a BLR can provide confidence intervals over the recovered PDE coefficients and since the uncertainty over each surrogate derivatives is well quantified through

, it will automatically discard the influence of highly uncertain derivatives. The regression model can be written as follows:(9) |

Note that in a state-of-the-art BLR with Gaussian noise, the variance of corresponds to the assumed variance of the output . However, here the variance of the spatial derivatives are more informative, so we introduce the scaled variance of the inputs instead. Equation 9 leads to the posterior distribution over the PDE coefficients:

(10) |

is a set of hyperparameters that maximize the marginal log-likelihood

. The coefficient of each derivative can be taken as the mean value with respect to the posterior (with confidence ). Here, Bayesian inference is analytically tractable and the expected values of interest can be computed exactly.### 3.4 Error Metric and Computer Implementation

For assessing the accuracy of our models, we use the norm between the ground true and the recovered vector of PDE coefficients:

(11) |

For baseline comparison, a standard deep neural network (DNN) is also trained with the same architecture as the BNN, using a mean-squared error loss with a learning rate and iterations. We also compare BLR with a standard linear regression with and without Lasso regularization and surrogate derivatives variance weighting.

To implement our neural network models we use PyTorch NEURIPS2019_9015. HMC sampling is performed using the Hamiltorch add-on library cobb2020scaling and the linear regressions for recovering PDE coefficients are performed using scikit-learn scikit-learn.

## 4 Application

### 4.1 Heat Equation

Let us consider the following 1D heat equation:

(12) |

(13) |

The temperature is recorded every for each 16 sensors ( training points), the ground true solution is shown in figure 1. The predictions of the BNN and DNN are shown in figure 2.

Table 1 shows the RMSE between the BNN predictive mean, DNN prediction and ground true over 1000 random input points and figure 3 represents the BNN and DNN predictions at different times. In this specific case, it appears that even in the noisy cases, the standard DNN doesn’t overfit too much, most definitely because heat diffusion is a very smooth process. We now generate the dataset of surrogate derivatives with

derivatives and test different linear regression models: ordinary least-squares regression, Lasso regression and Bayesian regression. For the BNN surrogate derivatives, we test both with variance weighted loss and without (in the later case the covariance matrix

in equation 6is replace by a scaled identity matrix).

Noise | BNN | DNN |
---|---|---|

0.0125 | 0.0099 | |

0.0130 | 0.0114 | |

0.0208 | 0.0277 |

First we consider the case where the functional form of the PDE is known (i.e. heat equation), and we want to infer the heat diffusivity of the system. Thus, the only derivative in the input derivative dataset is the second derivative in space. The recovered diffusivity coefficients for the different models is shown in table 2, 3 and 4. The BNN does a good job at recovering the ground true heat diffusivity and clearly outperforms the DNN model. Introducing the derivative variance in the linear regression helps achieving better accuracy (table 2 and 3).

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

We now consider the case where the underlying PDE is not known and we want to find the spatial derivatives involved in it. We assume spatial derivatives up to order 4, and 3 non-linear derivatives as well. The discovered PDE for the different models is shown in table 5, 6, 7, and 8. For every noise cases, the BNN combined with the variance weighted BLR (table 5) outperforms all the other baseline models. For larger noise, it overestimates slightly the influence of the order and order spatial derivatives, but still recovers the ground true diffusivity well, and do much better than the other models (table 6, 7 and 8).

Noise | Recovered PDE | ( norm) |
---|---|---|

True |

Noise | Recovered PDE | ( norm) |
---|---|---|

True |

Noise | Recovered PDE | ( norm) |
---|---|---|

True |

Noise | Recovered PDE | ( norm) |
---|---|---|

True |

### 4.2 Burgers Equation

Let us consider Burgers equation as originally proposed in JMLR:v19:18-046:

(14) |

(15) |

Measurements are recorded every for each 16 sensors ( training points), the ground true solution is shown in figure 4. The predictions of the BNN and DNN are shown in figure 5 and 6 and table 9 shows the RMSE between the BNN predictive mean, DNN prediction and ground true over 1000 random input points.

Table 10, 11 and 12 show the recovered Burgers equation for the DNN, BNN with no variance weighting and BNN with variance weighting respectively, using either least-squares regression or BLR. While the BNN combined with BLR almost do as well as the DNN for low noise cases, it completely outperforms the later for . Here, the BNN is able to avoid overfitting (and yields a lower RMSE than the DNN as shown in table 9). This can also be observed in figure 6: Notice how the standard DNN (green line) becomes oscillatory around noisy datapoints, while the BNN captures the underlying pattern correctly. Consequently, in this specific case the BNN is able to produce better derivative estimates, ultimately yielding higher PDE recovery accuracy.

Noise | BNN | DNN |
---|---|---|

0.0774 | 0.0283 | |

0.0725 | 0.0298 | |

0.0853 | 0.0919 |

Noise | Least Squares | Bayesian Regression |
---|---|---|

Noise | Least Squares | Bayesian Regression |
---|---|---|

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Table 13, 14, 15 and 16 show the discovered PDE when assuming that the functional form is unknown. Again we can see that the BNN with BLR outperforms the standard DNN (table 14) in the most noisy case, and does an excellent job at discovering the non-linear differential term. The BNN does struggle however to find the diffusion coefficient. A likely explanation is that overtime, the solution of Burgers equation in the present case becomes almost equivalent to a piece-wise linear function of space. Thus the diffusion term values get very small, making it harder to recover the coefficient associated to it accurately.

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

### 4.3 Korteweg-De Vries Equation

Finally, we consider Korteweg-De Vries (KdV) equation as originally proposed in JMLR:v19:18-046:

(16) |

(17) |

Measurements are recorded every for each 16 sensors ( training points), the ground true solution is shown in figure 7. The predictions of the BNN and DNN are shown in figure 8 and 9.

Table 17 shows the prediction RMSE for both the BNN and DNN. The DNN predictions outperforms the BNN, but it completely fails to recover the ground true PDE, as shown in table 18, 19 and 20. Even in the noiseless case, the DNN does a poor job, and especially struggles at predicting the non-linear derivative coefficient (table 18). Conversely, the BNN combined with either the BLR or the least-square regression (table 20) is able to recover the PDE coefficients much better. This example shows that the accuracy of the neural network predictions is independent from its capability to recover the governing PDE. Indeed, the neural network doesn’t have to be accurate everywhere, it only takes a few correct derivative estimations with well quantified uncertainty to find the ground true PDE.

Noise | BNN | DNN |
---|---|---|

0.1840 | 0.1018 | |

0.1411 | 0.1068 | |

0.1712 | 0.1158 |

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Noise | Least Squares | Bayesian Regression |
---|---|---|

True |

Table 21, 22, 23 and 24 show the discovered PDE when assuming that the functional form is unknown. Again, the BNN with BLR outperforms the other models. For every noise cases however, it does overestimate the influence of the order derivative, and underestimates the coefficients of the order derivative and the non-linear term. Indeed, the KdV equation creates strong oscillatory patterns with large gradient values, so higher order derivatives may be especially hard to estimate correctly. In fact, every models generally provide better accuracy for the non-linear derivative coefficient (which is only a order derivative) than with the order derivative term.

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Noise | Discovered PDE | ( norm) |
---|---|---|

True |

Comments

lpurdy01 ∙

Good on you for getting this done!