# Piecewise polynomial approximation of probability density functions with application to uncertainty quantification for stochastic PDEs

The probability density function (PDF) associated with a given set of samples is approximated by a piecewise-linear polynomial constructed with respect to a binning of the sample space. The kernel functions are a compactly supported basis for the space of such polynomials and are centered at the bin nodes rather than at the samples, as is the case for the standard kernel density estimation approach. This feature provides an approximation that is scalable with respect to the sample size. On the other hand, unlike other strategies that place kernel functions at bin nodes, the new approximation does not require the solution of a linear system. In addition, a simple rule that relates the bin size to the sample size eliminates the need for bandwidth selection procedures. The new density estimator has unitary integral, does not require a constraint to enforce positivity, and is consistent. The new approach is validated through numerical examples in which samples are drawn from known PDFs. The approach is also used to determine approximations of (unknown) PDFs associated with outputs of interest that depend on the solution of a stochastic partial differential equation.

## Authors

• 7 publications
• 13 publications
04/26/2021

### Data-Based Optimal Bandwidth for Kernel Density Estimation of Statistical Samples

It is a common practice to evaluate probability density function or matt...
06/02/2018

### Fast Exact Univariate Kernel Density Estimation

This paper presents new methodology for computationally efficient kernel...
08/03/2018

### Stochastic Expansions Including Random Inputs on the Unit Circle

Stochastic expansion-based methods of uncertainty quantification, such a...
04/29/2011

### Model Selection Consistency for Cointegrating Regressions

We study the asymptotic properties of the adaptive Lasso in cointegratio...
09/08/2020

### Nonparametric Density Estimation from Markov Chains

We introduce a new nonparametric density estimator inspired by Markov Ch...
01/13/2022

### A Non-Classical Parameterization for Density Estimation Using Sample Moments

Moment methods are an important means of density estimation, but they ar...
05/11/2018

### Robust Comparison of Kernel Densities on Spherical Domains

While spherical data arises in many contexts, including in directional s...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The problem of estimating a probability density function (PDF) associated with a given set of samples is of major relevance in a variety of mathematical and statistical applications; see, e.g., [1, 5, 6, 9, 11, 13, 18, 25, 29]. Histograms are perhaps the most popular means used in practice for this purpose. A histogram is a piecewise-constant approximation of an unknown PDF that is based on the subdivision of the sample space into subdomains that are commonly referred as bins. For simplicity, consider the case in which all bins have equal volume. The value of the histogram on any bin is given by the number of samples that lie within that bin along with a global scaling applied to ensure that the histogram has unitary integral. As an example, assume that a bounded one-dimensional sample domain has been discretized into a set of non-overlapping, covering, bins (i.e., intervals) of length . Assume also that one has in hand samples of an unknown PDF . For every , the value of the histogram function on is given by

 fhistoδ,M(Y)∣∣Bℓ=1δMM∑m=1XBℓ(Ym),ℓ=1,…,Nbins, (1)

where denotes the indicator function for . The above formula easily generalizes to higher-dimensional sample domains. Although the histogram is probably the easiest, with regards to implementation, means for estimating a PDF, it suffers from some limitations. For instance, being only a piecewise-constant approximation, it is discontinuous across bin boundaries and also the use of piecewise-constant approximations severely limits the discretization accuracy one can achieve.

An alternative method capable of overcoming the differentiability issue is kernel density estimation (KDE) for which the PDF is approximated by a sum of smooth kernel functions centered at the samples so that a desired smoothness of the approximation can be obtained [18]. Considering again a one-dimensional sample domain , the KDE approximation is defined as

 fkde(Y)=1bMM∑m=1K(Y−Ymb), (2)

where , which is referred to as the bandwidth, usually governs the decay rate of the kernel as increases. KDE approximations have been shown to be effective in a variety of applications; see, e.g., [13, 20, 27, 28]. However, despite possessing smoothness, KDE has some limitations. For example, the accuracy of the KDE approximation depends on the choice of kernel and the choice of the bandwidth also strongly affects the accuracy [17, 26]. More important, the KDE method does not scale well with the dimension of the sample data set, i.e., for a given , the evaluation of the KDE approximation (2) requires kernel evaluations so that clearly evaluating (2) becomes more expensive as the value of grows. The method based on spline smoothing given in [16] has features that are similar to our approach and is also scalable with respect to the sample size. Our method presents several advantages compared to that of [16]. First, to determine the coefficients, the solution of a linear system is not required. Second, our method does not involve a smoothing factor so that there is no need to tune the method with respect to such a factor. Third, no special constraints have to be introduced to ensure positivity of the PDF approximation. We note that in [23], sparse-grid sampling is substituted for the Monte Carlo sampling in the method of [16], allowing for consideration of larger . Of course, sparse-grid sampling can be substituted for Monte Carlo sampling in our approach as well.

In our approach, the unknown PDF is approximated by a piecewise-polynomial function, specifically a piecewise-linear polynomial, that is defined, as are histograms, with respect to a subdivision of the sample domain into bins. The approach is consistent in the sense that the approximate PDF converges to the exact PDF with respect to the norm as the number of samples tends to infinity and the volume of the bins tend to zero.

The paper is structured as follows. In Section 2, the mathematical foundation of our approach is laid out; there, it is shown analytically that the proposed approximation satisfies several requirements needed for it to be considered as a PDF estimator. A numerical investigation of the accuracy and computational costs incurred by our approach is then provided. First, in Section 3, our method is tested and validated using sample sets associated with different types of known PDFs so that exact errors can be determined. Then, in Section 4, the method is applied to the estimation of unknown PDFs of outputs of interest associated with the solution of a stochastic partial differential equation. Finally, concluding remarks and future work are discussed in Section 5.

## 2 Piecewise-linear polynomial approximations of PDFs

Let

denote a multivariate random variable belonging to a closed, bounded parameter domain

which, for simplicity, we assume is a polytope in . The probability density function (PDF) corresponding to is not known. However, we assume that we have in hand a data set of samples , , of . The goal is to construct an estimator for using the given data set of samples. To this end, we use an approximating space that is popular in the finite element community [7, 10].

Let denote a covering, non-overlapping subdivision of the sample domain into bins.222In the partial differential equation (PDE) setting, what we refer to as bins are often referred to as grid cells or finite elements or finite volumes. We instead refer to the subdomains as bins because that is the notation in common use for histograms which we use to compare to our new approach. Furthermore, in Section 4, we also use finite element grids for spatial discretization of partial differential equations, so that using the notation “bins” for parameter domain subdivisions helps us differentiate between subdivisions of parameter and spatial domains. For the same reason, we use instead of to parametrize parameter bin sizes because is in common use to parametrize spatial grid sizes. Here, parametrizes the subdivision and may be taken as, e.g, the largest diameter of any of the bins . The bins are usually chosen to be simplices or hyper-quadrilaterals; for example, if , they would be triangles or quadrilaterals. It is also assumed that the faces of the bins are either also complete faces of abutting bins or part of the boundary of . From a practical point of view, our considerations are limited to relatively small because . Detailed discussion about the subdivisions we use can be found in, e.g., [7, 10].

Let denote the set of nodes, i.e., vertices, of with denoting the number of nodes of . Note that we also have that . Based on the subdivision , we define the space of continuous piecewise polynomials

 Vδ={v∈C(Γ):v|Bℓ∈P1(Bℓ)for ℓ=1,…,Nbins},

where, for simplicial elements, denotes the space of multivariate linear polynomials and, for hyper-quadrilateral elements, denotes the space of -linear polynomials, e.g., bilinear and trilinear polynomials in two and three dimensions, respectively.

A basis for is given by, for ,

 ϕj(Y)={ϕj(Y)∈Vδ:ϕj(ˆYj′)=δjj′for j′=1,…,Nnodes},

where denotes the Kronecker delta function. In detail, we have that denotes the continuous piecewise-linear or piecewise -linear Lagrangian FEM basis corresponding to , i.e, we have that, for ,

• for simplicial bins, is a linear function on each bin , ;

• for hyper-quadrilateral bins, is an -linear function on each bin , , e.g., for , a linear, bilinear, or trilinear function, respectively;

• is continuous on ;

• at the -th node of the subdivision ; and

• if , at the -th node of the subdivision .

For , let and let ; note that consists of the union of the bins having the node as one of its vertices. Thus, the basis functions have compact support with respect to . An illustration of the basis functions in one dimension is given in Figure 1. We further let , for , denote the number of bins in , i.e., the number of bins that share the vertex .

Note that the approximating space and the basis are in common use for the finite element discretization or partial differential equations. Details about the geometric subdivision , the approximation space , and the basis functions and their properties may be found in, e.g., [7, 10].

Below we make use of two properties of the the basis . First, we have the well-known relation

 Nnodes∑j=1ϕj(Y)=1∀Y∈Γ. (3)

We also have that

 Cj=∫Γϕj(Y)dY=∫Sj(Y)ϕj(Y)dY=∑Bℓ∈Sj(Y)∫Bℓϕj(Y)dY. (4)

Note that, in general, is proportional to .

### 2.1 The piecewise-linear approximation of a PDF

Given the samples values in , we define the approximation of the unknown PDF given by

 f(Y)≈fδ,M(Y)=Nnodes∑j=1Fjϕj(Y)∈Vδ, (5) whereFj=1MCj∑Ym∈Sj(Y)ϕj(Ym) withCj=∫Sj(Y)ϕj(Y)dY,j=1,…,Nnodes.

Note that only the samples , i.e., only the samples in the support of the basis function , are used to determine .

Of course, the approximate PDF (5) should be a PDF in its own right. That it is indeed a PDF is shown in the following lemma.

for all and .

###### Proof 1

Clearly is non-negative because it is a linear combination of non-negative functions with non-negative coefficients.

 ∫Γfδ,M(Y)dY =Nnodes∑j=1Fj∫Γϕj(Y)dY (6) =Nnodes∑j=11MCjM∑m=1ϕj(Ym)∫Γϕj(Y)dY =1MNnodes∑i=1M∑m=1ϕj(Ym)=1MM∑m=11=1.

The third and fourth equalities hold because of (4) and (3), respectively.

For simplicity, from now on we only focus on bins consisting of -dimensional hypercubes of side . The next lemma is useful to prove the convergence of our approximation.

###### Lemma 2

Let with and let denote the expectation of with respect to . Then

 ∣∣f(ˆYj)−E[Fj]∣∣≤Cδα, (7)

where the constant does not depend on either or , and is a positive integer. If for some positive constant , then . Otherwise .

###### Proof 2

Let

be the characteristic function of

and define . Let

be defined in the usual tensor product fashion. Assuming

, we have for all . Then

 ϕj(Ym)=ϕ(ˆYj−Ymδ),Fj=1δNΓMM∑m=1ϕ(ˆYj−Ymδ). (8)

is the value of a kernel density estimator of evaluated at , with the function as a kernel. Using a standard argument for the bias of kernel density estimators we have that

 ∣∣f(ˆYj)−E[Fj]∣∣≤ δ∣∣∂f(ˆYj)∂Y∫[−1,1]NΓ∩Γϕ(Y′)Y′dY′∣∣ (9) (10)

The above inequality proves the result for . Thanks to the symmetry of , if for some positive constant , then , hence the result follows with .

The next theorem shows that the approximate PDF obtained with our method converges to the exact PDF with respect to the norm.

###### Theorem 1

Let be a polytope in and with If is the approximation of given in Eq. (5), then:

 limδ→0limM→∞∥f−fδ,M∥L2(Γ)=0.

Moreover, if for some positive constant , then

 limM→∞∥f−fδ,M∥L2(Γ)≤Cδ2,

where is a constant that does not depend on or .

###### Proof 3

Let

be the finite element nodal interpolant of

, then

 ∥f−fδ,M∥L2(Γ) ≤∥f−Iδf∥L2(Γ)+∥Iδf−fδ,M∥L2(Γ) (11) ≤C1δ2+∥Iδf−fδ,M∥L2(Γ), (12)

where is a constant that does not depend on [7]. Considering the second term in the above inequality, we have

 ∥Iδf−fδ,M∥L2(Γ) ≤ ⎷∫Γ(Nnodes∑j=1|f(ˆYj)−Fj|ϕj)2 (13) ≤ ⎷∫Γ[(Nnodes∑j=1|f(ˆYj)−E[Fj]|ϕj)+(Nnodes∑j=1|E[Fj]−Fj|ϕj)]2 (14) ≤ ⎷∫Γ[C2δα+(Nnodes∑j=1|E[Fj]−Fj|ϕj)]2. (15)

The last inequality is obtained using Lemma 2 and Eq. (3). Considering that , we have

 |E[Fj]−Fj| =∣∣E[ϕjδNΓ]−1MM∑m=1ϕj(Ym)δNΓ∣∣ (16) =1δNΓ∣∣E[ϕj]−1MM∑m=1ϕj(Ym)∣∣≤CδNΓδNΓ√M,

where for all , and depends on but not on . Hence

 ∥f−fδ,M∥L2(Γ)≤C1δ2+C2δα+CδNΓδNΓ√M, (17)

so the first result is obtained. If for some positive constant , then in Eq. (17), so the second result also follows taking the limit as .

We note that the numerical examples considered below show that convergence can be obtained even for cases where the PDF is not in , even when the PDF is not differentiable or even continuous.

### 2.2 Numerical illustrations

In Section 3, we validate our approach by approximating known joint PDFs . Of course, in comparing approximations to an exact known PDF, we pretend that we have no or very little knowledge about the latter except that we have available samples of the PDF at points , . Comparing with known PDFs allows us to precisely determine errors in the approximation of the PDF determined using our method. Then, in Section 4, we use our method to approximate the PDFs of outputs of interest associated with the solution of a stochastic partial differential equation; in that case, the PDF is not known. All computations were performed on a Dell Inspiron 15, 5000 series laptop with the CPU {Intel(R) Core(TM) i3-4030U CPU1.90GHz, 1895 MHz} and 8 GB of RAM.

Note that in all the numerical examples, denotes a sampling domain, i.e., all samples lie within . For most cases, is also the support domain for the PDF. However, we also consider the case in which the support of the PDF is not known beforehand so that the sampling domain is merely assumed to contain, but not be the same as, the support domain.

For simplicity, the sample space is assumed to be a bounded -dimensional box with . We subdivide the parameter domain into a congruent set of bins consisting of -dimensional hypercubes of side , where denotes the number of intervals in the subdivision of along each of the coordinate directions. We then have that the number of bins is given by and the number of nodes is given by . Our new method (5) can be easily extended to more general parameter domains, including ones of infinite extent, and to more general meshing strategies, but at the cost of substantially more complex notations. Likewise, for simplicity, we assume throughout that the components of the random variable are independently distributed so that the joint PDFs are given as the product of univariate PDFs; our method can also be extended to cases in which the components of are correlated.

## 3 Validation through comparisons with known PDFs

In this section, we assume that we have available samples drawn from a known PDF . The error incurred by any approximation of the exact PDF is measured by

 Efapprox=(1MM∑m=1(f(Ym)−fapprox(Ym))2)1/2. (18)

In particular, we use this error measure for our approximation defined in (5).

The accuracy of approximations of a PDF, be they by histograms or by our method, depends on both (the number of samples available) and (the length of the bin edges). Thus, and should be related to each other in such a way that errors due to sampling and bin size are commensurate with each other. Thus, if the bin size errors are of and the sampling error is of , we set

 M=δ−2r=(Nδb−a)2r. (19)

Thus, once and are specified, one can choose (or equivalently ) and the value of is determined by (19) or vice versa. Clearly, increases as decreases.

For most of the convergence rate illustrations given below, we

 choose  Nδ=2k,  k=1,2,…, so that δ=(b−a)2k  and M=N2rδ. (20)

Note that neither (19) or (20) depend on the dimension of the parameter domain but, of course, and

do. If the variance of the PDF is large, one may want to increase the size of

by multiplying the term in (19) by the variance.

The computation of the coefficients defined in (5) may be costly if is large and consequently is small. To improve the computational efficiency, we evaluate a basis function at a sample point only if the point is within the support of . However, the determination of the bin such that may be expensive in case of large and . For this task, we employ the efficient particle locating algorithm described in [8].

### 3.1 A smooth PDF with known support

For the first example we consider, we ignore the fact that we know the exact PDF we are trying to approximate, but assume we know the support of the PDF so the sampling domain is also the support domain. We also use this example to illustrate that the number of Monte Carlo samples needed is independent of the dimension of the parameter space.

We set so that

and assume that the components of the random vector

are independently and identically distributed according to a truncated standard Gaussian PDF so that the joint PDF is given by

 f(Y)=NΓ∏n=11√2πCGexp(−Y2n2)for Y∈Γ=[−5.5,5.5]NΓ (21) withCG=12(erf(5.5/√2)−erf(−5.5/√2)).

The scaling factor is introduced to insure that we indeed have a PDF, i.e., that the integral of over

is unity. Note that because the standard deviation of the underlying standard Gaussian PDF is unity, the values of the truncated Gaussian distribution (

21) near the faces of the box

are very small so that the results of this example are given to a precision such that they would not change if one considers instead the (non-truncated) standard Gaussian distribution. Also note that because the second moment of the standard Gaussian distribution is unity, the absolute error (

18) is also very close to the error relative to the given PDF.

Before we use the formula (20) to relate and , we first separately examine the convergence rates with respect to and . To this end, to illustrate the convergence with respect to , we set

 M=107andδ=11/2kfork=3,4,5,6, (22)

so that the error due to sampling is relatively negligible. For illustrating the convergence with respect to , we set

 δ=11/28andM=10kfork=3,4,5,6, (23)

so that the error due to the bin size is relatively negligible. The plots for in Figure 2 illustrate the second-order convergence rate with respect to and the half-order convergence rate with respect to as predicted in Theorem 1.

We now turn to relating and using the formula (20). We consider the multivariate truncated standard Gaussian PDF (21) for . Plots of the error vs. both and are given in Figure 3 from which we observe, in all cases, the second-order convergence rate with respect and the half-order convergence rate with respect to . We also observe that the errors and the number of samples used are largely independent of the value of . A visual comparison of the exact truncated standard Gaussian distribution (21) and its approximation (5) for the bivariate case is given in Figure 4.

Computational costs are reported in Table 1 in which, for each , we choose in (20) to determine and . Reading vertically for each , we see the increase in computational costs due to the decrease in and the related increase in . Reading horizontally so that and are fixed, the increase in costs is due to the increasing number of bins and nodes as increases. We note that our method is amenable to highly scalable parallelization not only as decreases and increases, but also as increases so that, through parallelization, our method may prove to be useful in dimensions higher than those considered here. When developing a parallel implementation of our method, using a particle locating algorithm such as that of [8] to locate a given node that is shared by several processors would be crucial to realizing the gains in efficiency due to parallelization.

### 3.2 A smooth PDF with unknown support

Still considering a known PDF, we now consider a case for which we not only pretend we do not know the PDF, but also we do not know its support. Specifically, we consider the uniform distribution

on . A univariate distribution suffices for the discussions of this case; multivariate distributions can be handled by the obvious extensions of what is said here about the univariate case. We assume that we know that the support of the known PDF lies within a larger interval . Of course, we may be mistaken about this so that once we examine the sample set , we may observe that some of the samples fall outside of . In this case we can enlarge the interval until we observe that the interval spanned by smallest to largest sample values is contained within the new .

We first simply assume that we have determined, either through external knowledge or by the process just described, that the support of the PDF we are considering lies somewhere within the interval . Not knowing the true support, we not only sample in the larger interval (so that here we have and ), but we also build the approximate PDF with respect to . We remark that a uniform distribution provides a stern test when the support of the distribution is not known because that distribution is as large at the boundaries of its support as it is in the interior. Distributions that are small near the boundaries of their support, e.g., the truncated Gaussian distribution of Section 3.1, would yield considerably smaller errors and better convergence rates compared to what are obtained for the uniform distribution. Choosing in (20), we obtain the errors plotted in Figure 5. Clearly, the convergence rates are nowhere near optimal. Of course, the reason for this is that by building the approximation with respect to , we are not approximating the uniform distribution on , but instead we are approximating the discontinuous distribution

 f[−1.5,1.5](Y)={1for Y∈[−1,1]0for Y∈[−1.5,−1] and Y∈[1,1.5].%

For comparison purposes we provide, in Figure 6, results for the case where we use the support interval for both sampling and for approximation construction. Because now the PDF is smooth, in fact constant, throughout the interval in which the approximation is constructed, we obtain optimal convergence rates.

One can improve on the results of Figure 5, even if one does not know the support of the PDF one is trying to approximate, by taking advantage of the fact that the samples obtained necessarily have to belong to the support of the PDF and therefore provide an estimate for that support. For instance, for the example we are considering, one could proceed as follows.

1. For a chosen , sample over .

2. Determine the minimum and maximum values and , respectively, of the sample set .

3. Choose the number of bins and set .

4. Build the approximation over the interval with a bin size .

It is reasonable to expect that as increases, the interval becomes a better approximation to the true support interval . Figure 7 illustrates the convergence of to . Note that because , the exact PDF is continuous within . Thus, it is also reasonable to expect that because the approximate PDF is built with respect an interval which is contained within the support of the exact PDF, that there will be an improvement in the accuracy of that approximation compared to that reported in Figure 5 and, in particular, that as one increases so that decreases and increases, better rates of convergence will be obtained. Figure 8 corresponds to the application of this procedure and shows the substantially smaller errors and substantially higher convergence rates compared that reported in Figure 5.

A visual comparisons of the approximations obtained using the smallest /largest pairing corresponding to Figures 5, 6, and 8 are given in Figure 9. The defects resulting from the use of the interval for constructing the approximation of a uniform PDF that has support on the interval are clearly evident. On the other had, using the support interval approximation process outlined above results in a visually identical approximation as that obtained using the correct support interval . Note that for the smallest value of , we have that approximates and approximates to seven decimal places.

### 3.3 A non-smooth PDF

We next consider the approximation of a non-smooth PDF. Specifically, we consider the centered truncated Laplace distribution

 f(Y)=13CLexp(−|Y|1.5) (24)

over , where is a scaling factor that ensures a unitary integral of the PDF over . Here, the support domain and sampling domain are the same. This distribution is merely continuous. i.e., its derivative is discontinuous at , so one cannot expect optimally accurate approximations. However, as illustrated in Figure 10, it seems the approximation does converge, but at a lower rate with respect to and at the optimal rate with respect to . The latter is not surprising because Monte Carlo sampling is largely impervious to the smoothness or lack thereof of the function being approximated.

Whenever there is any information about the smoothness of the PDF, one can choose an appropriate value of in (19). Alternately, possibly through a preliminary investigation, one can estimate the convergence rate of the approximation (5). In the case of the Laplace distribution which is continuous but not continuously differentiable, one cannot expect a convergence rate greater than one. Selecting in (20) to relate and , we obtain the results given in Figure 11 which depicts rates somewhat worse that we should perhaps expect.

The Laplace distribution, although not globally , is piecewise smooth, with failure of smoothness only occurring at the symmetry point of the distribution. For example, for the particular case of the centered distribution (24), the distribution is smooth for and . Thus, in general, one could build two separate, optimally accurate approximations, one for the right of the symmetry point and the other for the left of that point. Of course, doing so requires knowledge of where that point is located. If this information is not available, then one can estimate the location of that point by a process analogous to what we described in Section 3.2 for distributions whose support is not known a priori. Such a process can be extended to distributions with multiple points at which smoothness is compromised.

### 3.4 Bivariate mixed PDF

We now consider a bivariate PDF in which the random variables and are independently distributed according to different PDFs. Specifically, we have that is distributed according to a truncated Gaussian distribution with zero mean and standard deviation , whereas is distributed according to a truncated standard Gaussian. We choose so that the joint PDF is given by

 f(Y)=1√8πC′Gexp(−Y218)1√2πCGexp(−Y222), (25)

where is as in (21) and . Results for this case are shown in Figure 12, where we observe optimal convergence rates with respect to both and . Visual evidence of the accuracy of our approach is given in Figure 13 that shows the approximation of the exact PDF (25) and zoom-ins of the approximate and approximate PDFs. Computational times are very similar to those for in Table 1 so that they are not provided here.

## 4 Application to an unknown PDFs associated with a stochastic PDE

In this section, we consider the construction of approximations of the PDF of outputs of interest that depend on the solution of a stochastic PDE. In general, such PDFs are unknown a priori.

The boundary value problem considered is the stochastic Poisson problem

 {−∇⋅(κ(x,Z)∇u(x,Z))=1forx∈D,Z∈Γinputu(x,Z)=0forx∈∂D,Z∈Γinput, (26)

where denotes a spatial domain with boundary and is the sample space for the input random vector variable which we assume is distributed according to a known input joint PDF .

For the coefficient function , we assume that there exists a positive constant almost surely on for all . We also assume that is measurable with respect to . It is then known that the system (26) is well posed almost surely for ; see, e.g., [2, 15, 21, 22] for details.

#### Stochastic Galerkin approximation of the solution of the PDE

We assume that we have in hand an approximation of the solution of the system (26). Specifically, spatial approximation is effected via a piecewise-quadratic finite element method [7, 10]. Because this aspect of our algorithm is completely standard, we do not give further details about how we effect spatial approximation. For stochastic approximation, i.e., for approximation with respect to the parameter domain , we employ a spectral method. Specifically, we approximate using global orthogonal polynomials, where orthogonality is with respect to and the known PDF . Thus, if denotes a basis for the finite element space used for spatial approximation and denotes a basis for the spectral space used for approximation with respect to , the stochastic Galerkin approximation has the form

 uapprox(x,Z) =∑i∑jUi,jΦj(x)Ψi(Z) (27) =∑iui(x)Ψi(Z),whereui(x)=∑jUi,jΦj(x).

Note that once the SGM approximation (27) is constructed, it may be evaluated at any point and for any parameter vector . Having chosen the types of spatial and stochastic approximations we use, the approximate solution , i.e., the set of coefficients , is determined by a stochastic Galerkin method, i.e., we determine an approximation to the Galerkin projection of the exact solution with respect to both the spatial domain and the parameter domain . This approach is well documented so that we do not dwell on it any further; one may consult, e.g., [3, 4, 9, 14, 15], for details. Note that once the surrogate (27) for the solution of the PDE is built, it can be used through direct evaluation to cheaply determine an approximation of the solution of the PDE for any and any instead of having to do a new approximate PDE solve for any new choice of x and .

The error in depends on the grid-size parameter used for spatial approximation and the degree of the orthogonal polynomials used for approximation with respect to the input parameter vector . In practice, these parameters should be chosen so that the two errors introduced are commensurate. However, here, because our focus is on stochastic approximation and because throughout we use the same finite element method for spatial approximation, we use a small enough spatial grid size so that the error due to spatial approximation is, for all practical purposes, negligible compared to the errors due to stochastic approximation.

#### Outputs of interest depending on the solution of the PDE

In our context, outputs of interest are spatially-independent functionals of the solution of the system (26). Here, we consider the two specific functionals

 Y(Z)=∑i(1|D|∫Dui(x)dx)Ψi(Z