# Learning on dynamic statistical manifolds

Hyperbolic balance laws with uncertain (random) parameters and inputs are ubiquitous in science and engineering. Quantification of uncertainty in predictions derived from such laws, and reduction of predictive uncertainty via data assimilation, remain an open challenge. That is due to nonlinearity of governing equations, whose solutions are highly non-Gaussian and often discontinuous. To ameliorate these issues in a computationally efficient way, we use the method of distributions, which here takes the form of a deterministic equation for spatiotemporal evolution of the cumulative distribution function (CDF) of the random system state, as a means of forward uncertainty propagation. Uncertainty reduction is achieved by recasting the standard loss function, i.e., discrepancy between observations and model predictions, in distributional terms. This step exploits the equivalence between minimization of the square error discrepancy and the Kullback-Leibler divergence. The loss function is regularized by adding a Lagrangian constraint enforcing fulfillment of the CDF equation. Minimization is performed sequentially, progressively updating the parameters of the CDF equation as more measurements are assimilated.

## Authors

• 2 publications
• 14 publications
12/20/2019

### Parameter identification in uncertain scalar conservation laws discretized with the discontinuous stochastic Galerkin Scheme

We study an identification problem which estimates the parameters of the...
06/23/2021

### Lagrangian dual framework for conservative neural network solutions of kinetic equations

In this paper, we propose a novel conservative formulation for solving k...
01/13/2020

### Considering discrepancy when calibrating a mechanistic electrophysiology model

Uncertainty quantification (UQ) is a vital step in using mathematical mo...
05/09/2021

### Probabilistic forecast of multiphase transport under viscous and buoyancy forces in heterogeneous porous media

In this study, we develop a probabilistic approach to map the parametric...
03/04/2020

### Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network

We propose a method for training a deterministic deep model that can fin...
03/30/2020

### Multiscale stochastic reduced-order model for uncertainty propagation using Fokker-Planck equation with microstructure evolution applications

Uncertainty involved in computational materials modeling needs to be qua...
07/23/2019

### On URANS Congruity with Time Averaging: Analytical laws suggest improved models

The standard 1-equation model of turbulence was first derived by Prandt...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Robust and efficient quantification of parametric uncert- ainty in hyperbolic balance and conservations laws is hampered by their nonlinearity and solution structure, which typically posses sharp gradients and often exhibits shocks and/or discontinuities. Many uncertainty quantification techniques (e.g., stochastic finite elements and stochastic collocation), which can be orders of magnitude faster than standard Monte Carlo simulations (MCS) when applied to elliptic and parabolic equations, often underperform on hyperbolic problems.

The method of distributions (MD) [1]

is an uncertainty quantification technique that is tailor-made for hyperbolic problems with random coefficients and inputs. Its goal is to derive a deterministic partial-differential equation (PDE) for either the probability density function (PDF) or the cumulative distribution function (CDF) of the model output. In the presence of multiplicative noise introduced, e.g., by random parameter fields, MD requires a closure approximation, which is derived either via perturbation expansions or by resorting to phenomenology

[2, 3, 4]

. The method does not rely on a finite-term representation (e.g., via a truncated Karhunen-Loève expansion) of random parameter fields and, hence, does not suffer from the so-called “curse of dimensionality

[5, 1]; its computational cost is independent of the correlation length of an input parameter [6] and can be orders of magnitude lower than that of MCS [7, 2, 4], and its accuracy increases as the correlation length decreases [8, 1].

While MD enables one to quantify predictive uncertainty in hyperbolic models, assimilation of observations into probabilistic model predictions, e.g., by means of Bayes’ rule, facilitates reduction of this uncertainty. Within this framework, the model provides a link between observed quantities and the estimates of the state, filtered through an observational map

[9]. Direct application of Bayes’ rule is often impractical because of the high dimensionality of a joint PDF of system states, and because of complex relations between parameters and states and between states and observations [10, sec. 10.2]. For these reasons, a plethora of approximation techniques have been proposed. Some of these, e.g., maximum likelihood estimation (MLE) [11]

and maximum a posteriori estimation (MAP)

[12]

, aim to identify the mode of a posterior distribution, which can be inadequate if the latter is highly non-Gaussian (e.g., multimodal), as is typical of nonlinear models. Ensemble Kalman filters

[13]

allow one to handle nonlinear PDEs but assume that their solutions are Gaussian. Other methods, e.g., Markov chain Monte Carlo (MCMC)

[14] and particle filters [15], aim at sampling from the posterior directly and obviate the need for the Gaussianity and linearity assumptions. Like direct Bayesian updating, the methods of this class are computationally expensive because they rely on multiple forward solves of PDEs with uncertain (random) coefficients and/or auxiliary functions. Our goal is to eliminate this step by replacing it with MD.

Variational formulation recasts some of the methods described above (MLE, MAP, analysis step in EnKF) as a minimization problem in which a cost (loss) function contains the average distance between measurements and a model’s predictions; parameter estimation is then accomplished by minimizing this loss function with respect to the model’s parameters (and their statistical moments). This variational formulation belongs to a broader class of optimization methods, sometimes termed Variational Inference (VI)

[16], that approximate Bayesian posterior densities by imposing closeness (in the Kullback-Leibler divergence sense) to the target density. Key innovations of our method are to reformulate the loss function in distributional terms using a different discrepancy metric and to confine both the prior and the posterior distributions to a dynamic statistical manifold defined by a deterministic CDF equation. Minimization is done with respect to variables used to parameterize the closure terms in the CDF equation; these variables are, in turn, expressed in terms of the statistical properties of the uncertain parameters and/or auxiliary functions of the original model.

Resulting PDE-constrained optimization problems can be solved with several techniques [17]

. We employ a machine learning approach

[18, 19, 20]

, which approximates a PDE’s solution with a neural network whose coefficients are obtained by minimizing the resulting residual. This component of our algorithm places it in the burgeoning field variously known as physics-informed machine learning or data-aware modeling. Its goal is to overcome the scarcity of experimental data inherent in many physical systems by fusing physical constraints and observations. It is worthwhile emphasizing though that optimization techniques other than the one mentioned above can be used in our Bayesian data assimilation algorithm.

In section 2, we formulate a data assimilation problem for hyperbolic PDEs with uncertain parameters and/or auxiliary functions, and introduce MD as a forecast step in Bayesian updating. Section 3 contains a novel analysis step, in which MD is used as a constraint to reduce parametric uncertainty; technical details are provided in appendix A. We refer to this combination of forecast and analysis as the data-aware method of distributions (DA-MD). In section 4, we test our approach on a linear inhomogeneous hyperbolic equation; this setting admits both exact and approximate Bayesian updates of the random parameters (either spatially uniform or variable) and, hence, enables us to verify the method’s accuracy. Finally, in section 5, we summarize the main findings and discuss future directions.

## 2 Forecast: Method of Distributions

While the data assimilation approach introduced here is applicable to other problems, we formulate it in section 22.1 for hyperbolic PDEs with uncertain (random) parameters and/or auxiliary functions. This setting simplifies the derivation of a deterministic CDF equation used in section 22.2 as the forecast step in Bayesian data assimilation.

### 2.1 Problem Formulation

We consider a smooth state variable , whose dynamics is governed by a nonlinear hyperbolic PDE equationparentequation

 ∂u∂t+∇⋅q(u;θq)=r(u;θr),x∈Ω,t>0. (1a) This equation is subject to the initial condition u(x,t=0)=u0(x),x∈Ω (1b)

and, if the -dimensional domain is bounded, to appropriate boundary conditions along the domain boundary . The flux, , and the source term, , are parameterized by and , respectively. These real-valued parameters can either be constant or vary in space () and time (). The functions and are either linear or nonlinear, as long as the solution of (1) does not develop shocks.111The presence of shocks and discontinuities complicates the derivation of CDF equations [4, 21, 22], obfuscating our focus on data assimilation. For example, is the concentration of a reactive solute advected by a flow velocity , while undergoing chemical transformations; in this setting, is the advective flux parameterized by , and represents a chemical reaction parameterized by a reaction rate constant .

Incomplete or noisy measurements of the parameters render them uncertain; this uncertainty is quantified by treating

as random fields and random variables. Additionally, auxiliary functions, such as the initial state

and boundary functions, are uncertain/random. In the following, denotes the complete set of random inputs, comprised of both and auxiliary functions. This randomness renders, , a solution of (1), random as well. Rather than computing low statistical moments of (e.g., its ensemble mean

that are commonly used to obtain an unbiased estimator of a system’s dynamics and to quantify the corresponding predictive uncertainty, respectively), our goal is to compute its one-point CDF

where . The value space for the random variable , , identifies the support of the CDF . The latter can be either infinite (, with and ) or finite ( such that ).

The model (1) is supplemented with measurements of the state variable collected at selected space-time points with . These data, , are assumed to differ from the corresponding model predictions by a random measurement error ,

 dm=u[(x,t)m]+εm,m=1,⋯,Nmeas. (2)

The measurement errors are assumed to have zero mean, , and to be mutually uncorrelated, for all . A complete probabilistic description of the data is encapsulated in the PDF , which is also known as likelihood function. In the absence of measurement errors, the observational PDF is given by the Dirac distribution , i.e., .

### 2.2 CDF Equation

Direct numerical computation of the CDF , e.g., via Monte Carlo simulations of (1), is computationally expensive. Instead, we use MD to derive a -dimensional linear PDE for (see appendix A for details),

 ∂Fu∂t+Q(U;x,t)⋅˜∇Fu=˜∇⋅[D(U;x,t)˜∇Fu],(x,U)∈˜Ω,t>0. (3)

This deterministic PDE is defined in the augmented space .This equation is subject to initial and boundary conditions that reflect uncertainty in the initial and boundary conditions for the original problem (1). Additional boundary conditions are defined for , and ; they stem from the definition of a CDF.

In general, derivation of (3) requires a closure approximation, such as the perturbation expansion used in appendix A. Notable exceptions of practical significance include a scenario of random inputs (initial and boundary conditions) but deterministic parameters 222When both the inputs and parameters are deterministic, the strategy of transforming a nonlinear -dimensional hyperbolic PDE into its linear -dimensional counterpart is referred to as kinetic formulation of a hyperbolic conservation law [23]. ; in this case (3) is exact and its coefficients are given by (appendix A)

 Q(U;x,t)={˙q(U;θq),r(U;x,t)},D(U;x,t)=0, (4)

where . When the model parameters are random, i.e., when the CDF equation (3) in inexact, the coefficients and depend on a set of statistical parameters that characterize the randomness of . This set consists of the shape parameters of PDFs of

, i.e., their means, variances, and correlation lengths. Together with

and the statistical characteristics of the random auxiliary functions, these parameters represent the coordinates of a manifold of distributions , whose dynamics is governed by the CDF equation (3). Each point in this finite-dimensional coordinate space uniquely identifies a distribution [24].

The use of perturbative closures to derive a CDF equation raises several questions about its accuracy and robustness, which have been the subject of previous investigations. First, even though the coefficient of variation (CV) of the model parameters serves as a perturbation parameter, the resulting CDF equations for many applications remain accurate for relatively large values of CV [2, 8, 25]. Second, the coefficients of perturbation-based CDF equations, such as and in (3), depend only on the low-order statistical moments (such as ) of the model parameters, rather than their full PDFs. By using an advection-reaction equation as a test-case, we show in appendix A

that the resulting CDF equation is distributionally robust, giving consistent predictions of the system state’s CDF regardless of whether the model coefficient (spatially varying reaction rate) has a Gaussian, log-normal, or uniform PDF. Third, the accuracy of perturbation-based CDF equations depends on correlation lengths of the model parameters: these equations are often exact for white noise (zero correlation) and become progressively less so as the correlation lengths increase. If the correlation lengths are large, perturbation-based closures can be replaced with truncated Karhunen-Loéve expansions of the random parameter fields, leading to accurate/exact CDF equations

[6].

In summary, we use the CDF equation (3) as an efficient forecasting tool, which propagates parametric uncertainty in space and in time through a physical model. It represents a counterpart of a set of ensemble members or particles in the context of ensemble Kalman filter or particle filter, respectively. Its accuracy and computational efficiency vis-à-vis Monte Carlo simulations have been throughly investigated [2, 4, 7].

## 3 Analysis: Sequential Bayesian Update on Dynamic Manifolds

We use MD as a constraint for the analysis step, during which observations of the system state are used to refine the knowledge of the meta-parameters . Specifically, our novel analysis step involves minimization of the discrepancy between the “observational” CDF in each measurement location () and the corresponding “estimate” CDF :

 (5)

where

 ∥^Fu(U;(x,t)m)−Fu(U;φ;(x,t)m)∥2=(∫ΩU(^Fu(U;(x,t)m)−Fu(U;φ;(x,t)m))2dU)1/2.

The analysis step, i.e., minimization of (5), is performed sequentially for each observation , so that all the distributions above are uni-variate. Formulation (5) is at the core of our data assimilation strategy and requires a thorough explanation.

###### Remark 1

MD constraint: The estimate distribution is a solution of the CDF equation (3) subject to appropriate initial/boundary conditions. This boundary value problem is parameterized by the set of parameters , over which the discrepancy minimization is performed. In other words, (5) identifies the parameters of the CDF equation that yield a CDF in the measurement location as close as possible to the observational CDF . This implies that the minimization is performed on the manifold of distributions obeying the CDF equation. This observation is further elaborated upon in section 33.2. Reliance on MD obviates the need for both Gaussianity assumption for the system states and the linearity requirement for the physical model, as long as it is possible to develop a reliable and accurate CDF equation.

###### Remark 2

Observational CDFs: We construct the observational CDF,

 ^Fu(U;(x,t)m)=∫UUmin^fu(U;(x,t)m)dU,

via Bayesian update of the corresponding PDF at each space-time measurement point :

 ^fu(U;(x,t)m|dm)∝fL(dm|u[(x,t)m]=U)fu(U;φ(m−1);(x,t)m). (6)

The PDF is computed from a solution of the CDF equation (3) whose parameters are computed in the previous assimilation step. This procedure provides a local update of the system state’s PDF in the sense that it yields no information on the surrounding locations nor on the future time evolution of the state.

###### Remark 3

Sequential update: The sequential update of the observational PDF allows us to obtain final estimates for the MD parameters that are conditional on all assimilated observations [26]. It is employed both to reduce the dimensionality of the CDFs/PDFs involved and to facilitate real-time update of the estimates as new measurements become available [10, p. 101]. At each step, or for each data point, , we follow the following procedure.

• For , the MD parameters are initialized to define the prior and to compute (6). The normalization constant in (6) is obtained by (numerical) integration, .

• For , each update (6) accounts for conditioning on all previous measurements up to the current one, , such that

 ^fu(U;(x,t)m|d1:m)∝fL(d1:m|U)fu(U;φ(m−1);(x,t)m). (7)

This step implies that the prior distribution in the current measurement location obeys the CDF equation (3). If observation errors are mutually uncorrelated, then and

 ^fu(U;(x,t)m|d1:m) ∝m−1∏i=1fL(di|U)fL(dm|U)fu(U;φ(m−1);(x,t)m) ∝fL(dm|U)^fu(U;(x,t)m|d1:m−1). (8)

Here, is approximated by a solution of the CDF equation in with parameters from the previous iterative step. In other words, a solution of the CDF equation (3) with parameters serves as prior.

At the end of this sequential assimilation procedure, the CDF equation (3) with parameters allows us to predict the future dynamics of the CDF , i.e., to make a probabilistic forecast.

###### Remark 4

Choice of the discrepancy metric: Our reliance on the squared norm (a.k.a. Cramer’s distance [27]),

 ∥F1(U)−F2(U)∥22=∫UmaxUmin[F1(U)−F2(U)]2dU,

as a measure of discrepancy between any two CDFs, and , facilitates numerical minimization of the loss function in (5) with a technique described in section 33.1 below. We deploy it in place of a commonly used Kullback-Leibler (KL) divergence,

 DKL(F1,F2)=∫UmaxUminf1(U)lnf1(U)f2(U)dU,

for the following reasons. According to Pinsker’s inequality [28, 29], where is the norm. Since [30, Prop. 1.5], this yields . Since and share the same minimum (for both metrics are equal to zero), a solution of the minimization problem (5) would also minimize the corresponding loss function based on the KL divergence. Moreover, it is advantageous to employ MD in its CDF form, rather than its PDF form, because of the straightforward assignment of the boundary conditions along and smoother solutions.

###### Remark 5

Relationship to Variational Inference Techniques: Our method aims at approximating posterior densities in a Bayesian sense via a minimization procedure. As such, it connects with VI techniques, which use optimization to identify one joint density—chosen to belong to a specified family of approximate densities—which is close to the target posterior in KL divergence terms [16]. We choose a physics-based family of plausible distributions, which obey the CDF equation parameterized with a finite set of parameters. Constraining distributions to a dynamic manifold allows us to consider sequentially the update of single-point distributions: updated parameters can be used, in combination with the CDF equation, to obtain forecast predictions in different space-time locations. Moreover, it reduces drastically (to one) the dimensionality of the posterior distribution to be updated at each assimilation step.333In this regard, we mention the work by [24], where the reduction in complexity of statistical models is quantified by exploiting relevant embedding constraints specifying geodesic motion on curved statistical manifolds.

### 3.1 Loss Function Minimization

The PDE-constrained optimization problem (5) can be solved with several techniques [17]. If the CDF equation (3) admits an analytical solution, e.g., if the system parameters are deterministic and the initial and/or boundary functions are random, can be expressed as a (semi)explicit function of the statistical parameters, and , characterizing the initial and boundary CDFs and , respectively. Section 44.1 deals with such a scenario; it serves to verify the reliability of our approach by comparing its performance with that of the standard Bayesian update.

When the CDF equation (3) has to be solved numerically, we follow [31, 19] to approximate its solution, , with a neural network whose coefficients (weights and biases) are computed by minimizing the residual

 R=∂FNN∂t+(Q−~∇⋅D)⋅~∇FNN−D~ΔFNN (9)

at a set of points ; the initial and boundary conditions are enforced at a finite set of points . The derivatives in (9) are computed via automatic differentiation, as implemented in TensorFlow [32]. This procedure replaces the PDE-constrained minimization problem (5) with an optimization problem

 (10)

where

 MSER(φ)=1NresNres∑r=1∥R((x,t)r;φ)∥2, MSEB(φ)=1NauxNaux∑i=1∥FNN((U,x,t)i,φ)−Finp((U,x,t)i)∥2,

where represents the prescribed CDFs of either the initial state or the boundary functions along . The NN function approximation via minimization enjoys convergence guarantees in the chosen norm, e.g., [33, 34]. A solution of (10) provides a CDF surrogate (a “trained” NN) and the set of optimal parameters . The surrogate can then be used to update predictions and for forecast (not pursued here).

### 3.2 Information-Geometric Interpretation

A family of distributions satisfying the CDF equation (3) defines a dynamic statistical manifold . Each point in this space, with coordinates , uniquely identifies a physics-informed CDF of the model’s output at each space-time point . The manifold is differentiable in all coordinate directions and equipped with a Riemannian metric. The latter takes the form of the Fisher information metric (FIM), a matrix whose components are [35, p. 33]

 gjk(~φ)=∫∂lnfu(U;~φ)∂~φj∂lnfu(U;~φ)∂~φkfu(U;~φ)dU,j,k=1,…,d+1+Nφ, (11)

where is the number of manifold coordinates, with statistical parameters in the CDF equation (3).444This definition assumes the existence of the PDF ; for hyperbolic PDEs (1) with smooth solutions, it does exist and satisfies a PDF equation corresponding to the CDF equation (3[1, 6, 8]. The local curvature of the manifold, , represents a Euclidean metric (a distance on the manifold ) upon an appropriate change of variable. FIM quantifies the differential amount of information between two infinitesimally close points on a manifold; it is formally computed as the second derivative of the KL divergence of distributions and with [36].

The significance of FIM and its geometric implications [37] will be explored elsewhere. Here we focus on the calculation of the information gain achieved during each step of the data assimilation process. Specifically, we express an th analysis step in geometrical terms as a change of the coordinates on the statistical manifold , from to , and quantify the corresponding information gain by . This quantity is computed as a post-processing step for comparative analysis.

## 4 Numerical Experiments

Let us consider a scalar , whose dynamics satisfies a one-dimensional dimensionless advection-reaction equation equationparentequation

 ∂u∂t+∂q(u)∂x=r(x,u),q≡vu,r≡−k(x)u;x>0,t>0, (12a) subject to initial and boundary conditions u(x,t=0)=u0;u(x=0,t)=ub+s(t),s(t)=asin(2πνt+ϕ) (12b)

This problem describes, e.g., advection of a solute that undergoes linear decay; in this example represents normalized solute concentration, is normalized flow velocity along a streamline, and is the normalized reaction rate. In the simulations reported below, we set , , and . In the first test, is a deterministic constant, while the uniform initial state and baseline state are random variables. In the other two tests, both and are deterministic, and is alternatively treated either as a random constant or as a spatially varying random field.

In all three experiments, data sets are generated in accordance with (2) by adding Gaussian white noise, , to a solution of (12) with a given choice of model parameters. The likelihood function, with , is assumed to be Gaussian.

The CDF equation for (12) was derived, and the accuracy and robustness of the underlying closure approximations analyzed, in [2] for the three scenarios described above. Appendix A contains a brief summary of these results.

### 4.1 Uncertain Initial and Boundary Conditions

Let and be random uncorrelated random variables with (prior) PDFs and . Then the random initial and boundary states and are characterized by respective CDFs and with shape parameters and . In the absence of other sources of uncertainty, CDF of the random state in (12) satisfies exactly a PDE equationparentequation

 ∂Fu∂t+∂Fu∂x−kU∂Fu∂U=0 (13a) subject to initial and boundary conditions Fu(U;x,0)=F0,Fu(U;0,t)=Fb,Fu(Umin;x,t)=0,Fu(Umax;x,t)=1. (13b)

This boundary-value problem admits an analytical solution, with either or that are propagated along deterministic characteristic lines. The dynamic manifold of the resulting CDFs has coordinates . The analysis step of DA-MD takes place on this statistical manifold. Each measurement contributes to uncertainty reduction of either or (i.e., sharpens either or ), depending on the data location . Half of these measurements are collected at locations informing the initial condition, i.e., ), and the other half at locations informing the boundary condition, i.e., .

To verify the accuracy of DA-MD, we compare its predictions of the optimal parameters with those given by the Bayesian posterior joint PDF

 ^fu0,ub(U0,Ub|d1:Nmeas) =^fu0(U0|d1:Nmeas)^fub(Ub|d1:Nmeas) ∝fL(d1:Nmeas|u[(x,t)1:Nmeas;U0,Ub]fu0(U0)fub(U% b) ≈Nmeas∏m=1fL(dm|u[(x,t)m;U0,Ub]fu0(U0)fub(Ub). (14)

To facilitate the Bayesian update, we take and to be Gaussian, fully specified by their respective means and standard deviations, and . Then, (4.1) yields analytically-computable Gaussian posteriors and . In what follows, we compare those with the posterior parameters obtained via DA-MD, and , respectively. These posterior DA-MD parameters uniquely define the coefficients of the CDF equation (13), which then serves as an updated predictive tool. Equation (13) has an analytical solution although, in general, numerical minimization in (10) needs to be employed to compute its approximation .

Figure 1 exhibits the prior and posterior distributions for (those for behave similarly) computed with the alternative data assimilation strategies. The left panel represents these distributions as coordinates () on the statistical manifold of Gaussian distributions, whereas the right panel shows them as PDFs in the value space . The Bayesian update and the DA-MD approach yield almost identical results after assimilation of the same set of measurements, sharpening the distribution of the parameters around the true value.

Similar to the left panel in fig. 1, the prior and posterior CDFs of the state variable , both obeying the CDF equation (13), are represented as points on the statistical manifold with coordinates and , respectively. The amount of information used during the analysis and transferred from the measurements to the conditional predictions can be thought of as the distance between these points: the information gain from prior to posterior is quantified by the KL divergence between these distributions (section 33.2). For the same prior and the same observations, DA-MD and the Bayesian update yield almost identical KL discrepancies. Moreover, does not vary within the assimilation regions, i.e., it remains constant in the regions of the space-time domain where depends on either or . The KL divergence also allows one to compare the informational gain from different sets of observations: doubling the number of measurements from to yields, in the assimilation regions informed by either the initial or the boundary conditions, a gain in KL terms of 7% and 9%, respectively.

### 4.2 Uncertain Reaction Rate

In the following two test-cases, we treat the uncertain coefficient in (12) first as a random constant and then as a random field. The auxiliary variables and in (12b) are taken to be deterministic, so that the CDF equation (3) is subject to initial and boundary conditions

 Fu(U;x,0)=H(U−u0),Fu(U;b,t)=H(U−ub−s(t)).

#### 4.2.1 Random variable

The coefficients (4) in the CDF equation (3) take the form (appendix A)

 (15)

where , and and are the ensemble mean and standard deviation of , respectively. The coordinates of the dynamic manifold of the approximated CDFs are . The CDF equation is solved via finite volumes (FV) using the Fipy solver [38], setting the discretization elements to , and , with domain size defined by , and .

Minimization of (10) is done using the L-BFGS-B method implemented in TensorFlow [32] with a convergence threshold for the loss function value of . The solution of the CDF equation (3), whose coefficients are given by (15

), is represented by a fully connected NN with fixed architecture (9 layers, 20 nodes per hidden layer) and a sigmoidal activation function (hyperbolic tangent). Weights and biases of the NN are initialized at the beginning of the sequential procedure by approximating a solution of the CDF equation with prior statistical parameters

. Successive iterations are initialized with weights and biases from the previous step. This procedure considerably accelerates the identification of the target parameters. Zero residuals are enforced at locations within the space-time domain, whereas initial and boundary conditions are imposed at locations. Furthermore, we enforce non-negativity of .

###### Remark 6

The FV approximation is used to construct the observational CDFs, whereas the NN approximation is used on sparse set of points for numerical gradient-based minimization. The NN surrogate solution of the CDF equation (3) could also be used as a prior for the next assimilation step, with the advantage of being virtually free of artificial diffusion and with no theoretical limitation on the number of dimensions. This is not exploited further in this work, as research on the use of physics-informed NN to solve PDEs is not yet mature. Nevertheless, it has been shown to yield accurate identification of PDE parameters [31] and to reproduce qualitatively actual PDE solutions.

We compare the DA-MD estimate of the PDF of the model parameter with the Bayesian posterior PDF of . The latter is obtained analytically by assuming a Gaussian prior and taking advantage of the analytical solution of (12)

 ^fk(K|d1:Nmeas) ∝fL(d1:Nmeas|u[(x,t)1:Nmeas;K])fk(K) ≈Nmeas∑m=1fL(dm|u[(x,t)m;K])fk(K). (16)

The Bayesian and DA-MD posterior (and prior) PDFs of the random reaction rate are presented in fig. 2. The right panel shows these densities in the value space of , whereas the left panel represents the state distributions as points on the dynamic manifold . The Bayesian update is optimal and analytical. Its sole source of error stems from the calculation of the normalization constant via numerical integration; as such it is treated as a benchmark in this comparison. On the contrary, DA-MD is based on a series of approximations (closures for the CDF equation, FV and NN solutions of the CDF equation, numerical minimization of the loss function). Nevertheless, DA-MD yields an updated posterior which is close to the Bayesian one. The DA-MD posterior is sharper than the Bayesian posterior; this might be due to the effect of numerical diffusion that artificially smears the CDF profiles computed as a solution of the CDF equation. Convergence of the DA-MD is slow, but its computational time is not expected to scale with the dimensionality of the problem (e.g., when dealing with random parameter fields). This flexibility represents a major advantage of the proposed procedure versus Bayesian inference, and it is explored in a more challenging scenario in the following section.

The prior and posterior CDFs of , at the final assimilation time are shown in fig. 3. The posterior CDFs, for both the Bayesian and DA-MD assimilation, provide a state prediction that is closer than the prior CDF to its true value thanks to a more accurate parameter identification (shown in fig. 2). The value of measurements is evaluated in terms of their impact on the shape of the CDF at the measurement locations, and quantified by the KL divergence from the posterior to the prior. In this example, all locations exhibit the same information gain quantified by the KL divergence going from the posterior to the prior. That is because of the analytical one-to-one relation between and .

#### 4.2.2 Random field

Keeping all other conditions and settings unchanged, we now consider a spatially varying uncertain parameter . It is treated as a second-order stationary (statistically homogeneous) multivariate Gaussian random field with constant mean and standard deviation ; its two-point autocovariance function has either zero correlation length (i.e., uncorrelated random field or white noise),

 C(true)k(x−x′)=σ2kδ(x−x′),

or a finite correlation length ,

 C(true)k(x−x′)=σ2kexp(−|x−x′|/λ(true)k).

One realization with the chosen statistical parameters represents the reference random field , which was used to construct synthetic observations via the FV solution of (1) with (12).

The coefficients (4) in the CDF equation (3) now take the form (appendix A)

 Q=⎛⎜⎝1−⟨k⟩U−σ2kU2⎞⎟⎠,D=⎛⎜⎝000σ2kU22⎞⎟⎠, (17)

if is white noise, or

 Q=⎛⎜⎝1−⟨k⟩U−σ2kUα[eαt∗−1]⎞⎟⎠,D=⎛⎜⎝000σ2kU2α[eαt∗−1]⎞⎟⎠ (18)

with and , if has the exponential correlation . The corresponding dynamic manifolds have either the coordinates or the coordinates , respectively.

Unlike Bayesian update, which identifies the values at each spatial location with a consequent dramatic increase of the dimensionality of the target joint posterior PDF, DA-MD focuses on a finite set of parameters (the mean , the standard deviation and, in the correlated case, the correlation length ); its computational cost is comparable to that for the constant random parameter case. We compare the updated DA-MD parameters with an approximation of the Bayesian posterior, since the number of random parameters and the nonlinearity of the problem prevent analytical treatment.

We employ a standard ensemble Kalman filter (EnKF) [9, 10] for the update of the random field , discretized into point values, with ensemble size . EnKF requires multiple solutions of the physical model, which typically require special numerical treatment because of the high spatiotemporal variability of the model parameters. The choice of a spatial resolution poses another difficulty because the correlation length of the target random field is a-priori unknown. This increases the numerical complexity of EnKF, to the advantage of MD. To focus on the data assimilation aspect of the problem, we solve both the physical model and the CDF equation on the same grid and with the same FV numerical solver, thus taking advantage of MD’s lower numerical complexity. Update is done sequentially for DA-MD, and recursively for EnKF [39, and references therein], i.e., at each assimilation step the ensemble members are forecast from the initial time to the current assimilation time. In both cases (EnKF and DA-MD), we assimilate the same measurements collected at two spatial locations, and , in ten separate temporal instances, .

Figures 4 and 6 exhibit the EnKF and DA-MD posterior random fields for the uncorrelated and correlated cases, respectively. When the true field is white noise, DA-MD accurately identifies the updated mean , but underestimates the value of . The latter might be due to the impact of artificial diffusion on the solution of the CDF equation used as a prior in the DA-MD procedure. EnKF yields a wider posterior estimate for , with spatial averages for the mean and the standard deviation that are further away from the spatial averages of the moments of the true field (values in the caption).

Despite these differences in reconstruction of the statistical properties of the posterior

, both assimilation techniques yield a posterior prediction of

that approaches the true value of the solution (the left panel in fig. 5). The information gain from the measurements is quantified in terms of the KL divergence for both DA-MD and EnKF (the right panel in fig. 5) at time

. MD densities (both the prior and the posterior) are calculated via finite differences from the solution of the CDF equation, whereas EnKF densities are computed via Kernel Density Estimation with Gaussian kernel and Scott’s bandwidth, using the ensemble members as data points. Our results suggest DA-MD extracts more information than EnKF from the same set of measurements in the current configuration at almost all values of

, as is also reflected in an accurate characterization of the posterior field. Observations collected at (the region where characteristic lines originate from the initial conditions) are more informative for DA-MD assimilation. The KL divergence for EnKF highlights the locations of more informative measurements, displaying two distinctive peaks.

Figure 6 exhibits the results of a similar analysis for the correlated field . DA-MD posterior estimates of the mean and standard deviation of are closer to the averaged statistical properties of the true field than EnKF estimates are (values are in the figure caption). DA-MD underestimates the spatial correlation length , whereas the identification of via EnKF is inconclusive as the semivariogram for does not develop a sill. We identify an intermediate plateaux and assume the corresponding lag value as the updated correlation length for the field. The semivariogram is computed using the posterior ensemble member values, and is shown in fig. 6(right). The corresponding state CDFs are plotted in fig. 6(left) in two representative sections that correspond to measurement locations. Both DA-MD and EnKF yield a posterior state CDF considerably closer to the true value than the prior distribution.

## 5 Summary and Future Work

We proposed a novel methodology for parameter estimation that leverages the method of distributions (MD) for both the forecast and analysis steps. Reduction of uncertainty on model parameters is recast into a problem of identification of closure parameters for the CDF equation, expressing the space-time evolution of uncertainty for the model output. Specifically, we identify the parameters in the CDF equation (3) which yield an estimate in the measurement locations as close as possible to the state distribution. This is expressed by an observational Bayesian posterior in that specific location, which is obtained combining the data model and the physically-based prior. The procedure is done sequentially, progressively updating the parameters of the CDF equation as more measurements are assimilated. We demonstrated that our method reproduces Bayesian posteriors in scenarios where Bayesian inference can be performed analytically, and ameliorates parameter identification when compared to ensemble Kalman filter (as an approximation of Bayesian update) in cases where Bayesian inference is elusive.

This work opens multiple possible research venues. In particular, we plan to i) explore the construction of novel data-driven closure approximations for MD; ii) investigate the use of novel ML techniques for more efficient optimization and/or solution of PDEs; iii) introduce multi-point statistics.

## Appendix A Derivation of CDF Equations

MD commences by defining a so-called raw CDF , where is the Heaviside function. Let denote the single-point PDF of . Then it follows from the definition of the ensemble mean that

 E[π(U;x,t)] =∫UmaxUminH(U−U)fu(U;x,t)dU=∫UUminH(U−U)fu(U;x,t)dU =Fu(U;x,t). (19)

Other useful properties of are

 ∂π∂t=∂π∂u∂u∂t=−∂π∂U∂u∂tand∇π=−∂π∂U∇u. (20)

Accounting for these properties, multiplication of (1) by yields

 ∂π∂t+˙q(U)⋅∇π+r(U)∂π∂U=0, (21)

where . This equation is exact as long as solutions of (1), , are smooth (do not develop shocks) for each realization of random parameters . It is subject to initial and boundary conditions derived from the initial and boundary conditions of the physical problem, and to and .

In the absence of uncertainty, (21) is deterministic and equiv