Development of a spectral source inverse model by using generalized polynomial chaos

01/09/2018 ∙ by Kyongmin Yeo, et al. ∙ 0

We present a spectral inverse model to estimate a smooth source function from a limited number of observations for an advection-diffusion problem. A standard least-square inverse model is formulated by using a set of Gaussian radial basis functions (GRBF) on a rectangular mesh system. Here, the choice of the collocation points is modeled as a random variable and the generalized polynomial chaos (gPC) expansion is used to represent the random mesh system. It is shown that the convolution of gPC and GRBF provides hierarchical basis functions for a spectral source estimation. We propose a mixed l1 and l2 regularization to exploit the hierarchical nature of the basis polynomials to find a sparse solution. The spectral inverse model has an advantage over the standard least-square inverse model when the number of data is limited. It is shown that the spectral inverse model provides a good estimate of the source function even when the number of unknown parameters (m) is much larger the number of data (n), e.g., m/n > 50.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 16

page 17

page 19

page 22

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Air pollution, generated by either anthropogenic or natural causes, poses a major public health threat. Not only long-term (Hoek et al., 2013), but also acute exposure (Phalen and Phalen, 2011) over a certain threshold can cause health problems. Due to its immense importance, there have been substantial development in the computational modeling of the transport of air-borne pollution over the past decade (Byun and Schere, 2006; El-Harbawl, 2013; Fast et al., 2006). However, prediction of air pollution by using these computational models requires extensive prior information on the distribution and magnitudes of pollution emission sources, which in most cases is incomplete or has high uncertainty (Thunis et al., 2016). Moreover, in many cases, it is of greater interest to identify the source of pollution when abnormally high pollution is observed in the air quality monitoring network to mitigate a possible public health hazard. This atmospheric inverse problem to find the pollution emission source using a set of measurements from a sensor network, has attracted significant attention in the atmospheric science community.

One of the fundamental building blocks of the inverse model is the atmospheric dispersion process, which is modeled by an advection-diffusion equation (Stockie, 2011). Deterministic approaches, adopted from the field of atmospheric data assimilation, have been used widely for the source inverse problem (Eckhardt et al., 2008; Issartel et al., 2007; Martinez-Camara et al., 2014; Pudykiewicz, 1998)

. In the deterministic approaches, typically a partial-differential-equation constrained optimization problem is solved to minimize a convex loss function,

e.g.,

-distance between the computational model prediction and the observations. Since the optimization formulation for an inverse model usually leads to underdetermined or ill-conditioned system, much research effort is focused on regularizing the solution. Recently, inverse models exploiting Bayesian inference have become popular

(Chow et al., 2008; Keats et al., 2007; Rajaona et al., 2015), due to the strength of the Bayesian methods in dealing with noisy and incomplete data. In Keats et al. (2007), the adjoint advection-diffusion operator is used to reduce the computational cost. Hwang et al. (2019) proposed an efficient Bayesian source inversion model to estimate the two-dimensional source function by exploiting the adjoint advection-diffusion operator. More general approaches to mitigate the high computational cost have been proposed by either accelerating the convergence of a Monte Carlo simulation (Marzouk et al., 2007), constructing surrogate models (Li and Marzouk, 2014), or developing a low-dimensional representation (Lieberman et al., 2010; Roosta-Khorasani et al., 2014).

Most of the previous inverse models consider either estimating the magnitudes of the source at each computational grid points by combining a large volume of heterogenous data (de Foy et al., 2015; Hwang et al., 2018; Issartel et al., 2007), or finding the locations and magnitudes of one or a few point sources from a limited number of data (Keats et al., 2007; Marzouk et al., 2007). In this paper, we propose an inverse model based on a regularized optimization formulation to estimate a smooth source function from a small number of observations. First, we follow the conventional approach of approximating a smooth function by a set of Gaussian radial basis functions centered at the collocation points of a rectangular mesh system. Obviously, the solution of the inverse models is strongly dependent on the choice of the mesh system. To relax the dependency on the mesh system, we introduce a random mesh system, in which the choice of the mesh system is modeled as a random variable. A stochastic inverse model is formulated on this random mesh system and the generalized Polynomial Chaos expansion (gPC) (Xiu, 2007) is employed to tackle the stochastic inverse problem. It is shown that the stochastic formulation leads to a -inverse model, in which the unknown smooth function is approximated by hierarchical basis functions. The -inverse model has an advantage over the standard least-square inverse model, particularly when the number of data is limited, due to its capability of -refinement (Karniadakis and Sherwin, 2005).

This paper is organized as follows. Section 2 describes a least-square formulation of the advection-diffusion problem by using an adjoint operator. In section 3, we reformulate the deterministic least-square problem as a stochastic problem by using gPC. In section 4, a mixed - and -regularization is introduced to exploit the hierarchical nature of the basis functions and an algorithm based on the alternating direction method of multipliers is presented to solve the optimization problem. The proposed inverse model is tested in section 5. Finally, the concluding remarks are given in section 6.

2 Least-square inverse model

2.1 Forward model

We consider the following advection-diffusion problem,

(1)

Here, is a rectangular domain , in which is the coordinate of the lower left corner of the domain and and are the lengths in the and directions, respectively, with the boundary , and

denotes an outward normal vector on

. The outflow and inflow boundaries are defined in terms of the fluid velocity as and . The fluid velocity is assumed to be given by a measurement or a computational fluid dynamics model. The advection-diffusion operator is defined as

(2)

in which

is a symmetric second-order tensor of the diffusivity. We assume that the elements of

and are smooth functions with uniformly bounded derivatives of all orders. The source strength is the unknown function, but assumed to be smooth. Furthermore, we consider a non-negative source, i.e., for every .

Contrary to the usual computational prediction problem, where equation (1) is solved for known to estimate at the sensor locations, the inverse model aims to estimate from the given observations at the observation time . Here, instead of estimating directly by solving an infinite dimensional optimization problem, is approximated by the sum of a set of basis functions to reduce the dimensionality of the problem. Let be a finite-dimensional approximation,

(3)

Here, is a set of basis functions, is the total number of the basis functions and denotes the coefficients. The problem of estimating a continuous surface is reduced to a problem of finding coefficients, . There are many possible choices for the basis functions as long as satisfies the non-negativity condition: . Here, a set of Gaussian radial basis functions (GRBF) located at the collocation points of a rectangular mesh is used as the basis functions;

(4)

in which is the distance between the neighboring collocation points, is the location of -th collocation point and is an parameter.

Figure 1: Example of the collocation points (white dot) and a basis function .

The rectangular mesh system is defined by a tensor product of two one-dimensional collocation sets,

(5)

Here, and denote the sets of collocation points in the - and -directions, respectively;

in which is the left and is the bottom end of the mesh system, and is the number of the collocation points in the -direction, . An example of and GRBF is shown in figure 1. Obviously, in the limit of , converges to the Dirac delta function and, hence, converges uniformly to . Using GRBF, the non-negativity condition can be satisfied by for .

Now, the advection-diffusion equation can be written as

(6)

Since is a linear operator, we can exploit the superposition of the solutions. Let be the solution for the -th GRBF;

Then, clearly,

Moreover, from the linearity of ,

(7)

Here, is the solution for with a unit strength, i.e., .

The computational model output is related to the observation by an inner product with respect to a sensor function () as

(8)

in which is the observation at the -th sensor, is the location of the -th sensor, is the time of the measurement, is the total number of the sensors, and

is a Gaussian white noise representing the errors in the measurement as well as the computational model. The angle bracket denotes an inner product

The sensor function depends on the types of the sensor or the data used in the analysis. In this study, the sensor function is defined as

(9)

Here is a Heaviside function, which is zero for and one otherwise, and is an time-average window of the sensor.

By comparing (7) and (8), can be related to the measurement by

(10)

Here, and

(11)

The dispersion matrix relates to the observation . Since can be computed by solving exactly the same partial differential equation (2) for each GRBF (), the same numerical solver can be recycled.

The source strength can be obtained by finding from the following least-square minimization problem

(12)

in which is a regularization. Since we consider the case , is a rank-deficient matrix and a regularization is required to guarantee the uniqueness of the solution. We refer to (12) as a least-square (LS) inverse model.

2.2 Adjoint model

It is important to note that computing requires to solve the advection-diffusion equation for times, which makes it computationally impractical as becomes large. Moreover, as will be discussed in section 3, when the model uncertainty is considered, the total number of computation easily blows up to . To circumvent these difficulties, an adjoint model is employed in this study. Reducing the number of repetitive computations from the number of GRBFs, , to the number of observations, , an adjoint model is computationally more tractable (Keats et al., 2007).

Here, the adjoint model is briefly described. Define a conjugate field () as

(13)

Then, the adjoint operator is obtained from the Lagrangian duality relation;

(14)

which gives

(15)

for . The adjoint model (15) is solved backward in time from . For more details, see Pudykiewicz (1998).

Once the -th conjugate field is computed by solving (15) with appropriate boundary conditions (Hourdin et al., 2006), at the -th sensor is computed as

(16)

Repeating the process for conjugate fields, the observation vector is

(17)

in which

It is trivial to show that . The coefficients can be computed by solving the same least-square minimization problem (12).

3 Generalized polynomial chaos for model uncertainty

The Gaussian radial basis function, , distributes the source strength computed from (12) in the space centered on the collocation points of . Obviously, the solution of a LS inverse model depends on the choice of . For example, if a local peak of does not coincide with one of the collocation points, the LS inverse model will result in a poor accuracy. In general, there is no standard rule of choosing . In this study, we propose to represent the uncertainty in the choice of as a random variable.

Let

be a random variable with a uniform distribution;

(18)

Here, and

are real random variables defined over a probability space

, in which is the sample space, is the -algebra, is the probability measure, and is an element of . Both and are defined over

with the probability density functions,

, , i.e.  for , 2. Note that corresponds to a random translation of , which uniformly covers the entire computational domain. In the absence of prior information on the source location, a natural choice would be to give an equal probability to every possible .

On the random collocation system, , (17) becomes

(19)

in which

and

Here, we aim to model

by the first moment of the stochastic system. Although it is possible to develop a model matching higher moments, it will lead to a complex non-convex optimization problem. Taking an expectation over

, (19) becomes

(20)

Then, a least-square inverse model can be formulated as

(21)

in which

Note that, since we consider only the first moment, the non-negative condition is imposed only on the expectation of . Hereafter, the obvious dependence on is omitted in the expectation, i.e., .

Following Xiu (2007); Xiu and Karniadakis (2002), the generalized polynomial chaos (gPC) expansion is employed to approximate the stochastic functions;

(22)
(23)

Here, is an orthonormal polynomial basis in a bivariate polynomial space () constructed by a tensor product of one-dimensional polynomial spaces

in which

As is a uniform random variable, the Legendre polynomial is chosen as the basis polynomial (Xiu and Karniadakis, 2002).

Figure 2: First four modes of the one-dimensional gPC coefficients, , for . is normalized by .

From the gPC approximation,

(24)

in which , , and denotes the total number of the basis functions excluding the mean (zero-th order) component, . Here,

(25)

And, the source function is

(26)

The gPC mode of GRBF, , is

(27)

The coefficients can be easily computed by using a numerical integration such as the Gaussian quadrature. For low order modes, can be even computed analytically. For example, the first mode of the expansion is

(28)

Figure 2 shows the first four modes of . It is shown that constitutes spatial hierarchical basis functions to approximate . The advantage of using the hierarchical basis functions over increasing the number of GRBF () is discussed in section 4.

4 Regularized optimization formulation

Using the gPC expansion, the minimization problem (21) becomes,

(29)

In the minimization problem, the number of parameters to estimate is . Note that, if we use the forward simulation approach shown in section 2.1, even for and , the total number of numerical simulations to compute becomes . On the other hand, using the adjoint model, the total number of the numerical simulations remains as , which is and can be computed efficiently by evaluating the inner product with a numerical integration.

In (29), should satisfy the non-negativity constraint, for every . Because it is difficult to directly impose the non-negativity condition for every , we propose an indirect constraint based on the stochastic collocation approximation (Xiu and Hesthaven, 2005). In the stochastic collocation method, the stochastic functions are approximated by the Lagrange polynomials as

(30)
(31)

Here, is the -th Lagrange polynomial, which is for the th collocation points in , . Again, is constructed by the tensor product of two one-dimensional Lagrange polynomials in and . The stochastic collocation method corresponds to a deterministic sampling and the coefficients are easily computed by the function evaluations at each collocation points in , i.e., and . Since GRBF is positivie, for all , the non-negativity condition implies for . In other words, in the stochastic collocation approach, it is sufficient to impose the non-negativity constraint only on the parameters, not on the field. The stochastic collocation coefficients, , are related to the modal gPC coefficients, , as

Then, the constraint on the smooth function surface for every can be approximated by the following linear constraint

(32)

Or,

(33)

in which is a block diagonal matrix for . In this study, we choose and (32) is evaluated at the Chebyshev node.

As an analogy to the finite element analysis (Karniadakis and Sherwin, 2005), either - or -type refinement can be used to increase the resolution of the proposed inverse model. Let be the number of GRBFs for a reference case, i.e., the number of collocation points of . In the -type refinement, the grid space is decreased as for , which makes the total number of unknown parameters . In the -type refinement, is fixed and the maximum order of is increased, which results in for . Because of this quadratic dependence, in both - and -type refinements, the number of unknown parameters can easily overwhelm the number of observations upon a refinement. For example, in the numerical experiments in section 5, the number of unknown parameters, i.e. the dimension of , is , while the number of data . As a result, the optimization formulation results in a highly ill-posed system, of which solution heavily relies on the choice of the regularization. To alleviate the difficulty, we develop a regularization strategy, which exploits the hierarchical nature of the GRBF-gPC coefficients, ,

As shown in (28), the zero-th gPC mode of GRBF, , represents the average source strength in centered at . Because , the non-negativity constraint implies that

(34)

At the same time, when the mean source strength, , is zero, the variation around the mean, represented by the higher-order gPC modes, should also be zero, i.e.,

(35)

Therefore, in the -type refinement, the number of unknown parameters can be effectively reduced by identifying non-zero elements in .

From these observations, we propose the following mixed - and -regularizations,

(36)

in which . The Least Absolute Shrinkage and Selection Operator (LASSO), or -regularization, is one of the most widely used regularization methods to find such a “sparse” solution for the so-called “large , small ” problem (large number of parameters and small number of data) (Tibshirani, 1996). As discussed above, for the -type refinement, applying LASSO only for is enough to guarantee a sparse solution. Hence, LASSO is applied only to the zeroth mode, , and the higher-order terms are regularized by the standard Tikhonov regularization.

From the non-negativity constraint, we know that . Then, the regularized optimization problem can be written as

(37)

The first tuning parameter controls the sparsity in the solution, , and the second one prevents the overfitting by regularizing the total variation around the mean.

Furthermore, to consider the spatial smoothness of , the fused LASSO is used (Tibshirani et al., 2005). In the fused LASSO, -norm is imposed on the difference between a directly connected parameters. For example, in the -direction, the fused LASSO regularization is

(38)

in which is an index set for directly connected in the -direction, i.e.,

In other words, the fused LASSO is equivalent to imposing -penalty in the gradient of . We can define the difference matrix in the -direction, , similar to . Then, the regularization can be written as a mixed generalized LASSO (Tibshirani. and Taylor, 2011) and Tikhonov regularization,

(39)

in which

The coefficient determines the relative weight of the standard LASSO to the fused LASSO.

Finally, the optimization problem for the -type source reconstruction is

(40)

Note that the coupling between the modes shown in (35) is not explicitly imposed in the regularization. However, the sparsity in the modal domain is implicitly imposed through the linear constraint. This optimization problem is solved by using the Alternating Direction Method of Multipliers (ADMM). ADMM is one of the most widely used method for a large-scale optimization (Boyd et al., 2010).

Rewriting (40) as a constraint optimization problem,

(41)

where , , , and . Note that

is padded with a null matrix

, because the generalized LASSO, , is applied only to . The subscript indicates a projection onto a positive set , i.e., . The augmented Lagrangian form is

(42)

in which is a constant. Then, the solution procedure for the minimization problem is as follows;

  1. Set initial conditions for and other variables.

    Here,

    is an identity matrix of the dimension of

    , whose first elements are zero, which imposes the Tikhonov regularization on . Then,

  2. For , update by

  3. Next, update , i.e., and by

    Here, is a vector of dimension , i.e., the number of GRBFs and the number of the quadrature points for the evaluation of the non-negativity condition. The operator takes only the upper elements of the vector for the generalized LASSO and subsets the lower elements of the vector for the non-negativity constraint.

  4. Finally, update ,

  5. Repeat steps 2 4, until the improvement

    for a pre-specified tolerance level .

The Tikhonov regularization is imposed in Step 2. In Step 3, the fused LASSO is imposed on and the non-negativity constraint is imposed by a projection on the non-negative set in .

5 Numerical experiments

Figure 3: (a) The computational domain and the location of the sensors. (b) Examples of a few time-integrated adjoint field, .

We use numerical simulations to study the behavior of the -inverse model and the results are compared with the standard least-square inverse models (12) with LASSO and fused-LASSO regularizations (Tibshirani et al., 2005; Tibshirani, 1996). The computational domain for this case study is . The adjoint equation (15) is numerically integrated for by using a third-order low-storage Runge-Kutta method and an upwind finite volume method. The computational grid and time step sizes are chosen as and .

Figure 3 (a) shows the computational domain as well as the locations of the sensors. The total number of observations is . The sensors are located at a rectangular mesh of which grid size is . The time averaging window of the sensor in equation (9) is and the time of observation is . The computational parameters are chosen similar to real operational air pollution measurement conditions. In an operational air pollution measurement, usually a high frequency sensor measurement is averaged over one to three hours to remove measurement noise.

Figure 4: (a) True source surface together with the sensor numbers. (b) The observation () at each sensors.

A few adjoint functions are shown in figure 3

(b). To mimic the atmospheric dispersion process, the wind field is generated by a Fourier transform of a set of Ornstein-Uhlenbeck processes. The velocity in the

-direction is

(43)

in which is the maximum number of the Fourier modes and is the wavenumber. The Fourier coefficients are computed by solving the following Langevin equation;

(44)

in which is a relaxation timescale, S is a scale parameter, and denotes the Wiener process . In this study, and are used (Pope, 2000). The velocity in the -direction is computed from the divergence-free condition;

(45)

In other words,

(46)

The mean components are also obtained by solving the same Langevin equation, but with