1 Introduction
Air pollution, generated by either anthropogenic or natural causes, poses a major public health threat. Not only longterm (Hoek et al., 2013), but also acute exposure (Phalen and Phalen, 2011) over a certain threshold can cause health problems. Due to its immense importance, there have been substantial development in the computational modeling of the transport of airborne pollution over the past decade (Byun and Schere, 2006; ElHarbawl, 2013; Fast et al., 2006). However, prediction of air pollution by using these computational models requires extensive prior information on the distribution and magnitudes of pollution emission sources, which in most cases is incomplete or has high uncertainty (Thunis et al., 2016). Moreover, in many cases, it is of greater interest to identify the source of pollution when abnormally high pollution is observed in the air quality monitoring network to mitigate a possible public health hazard. This atmospheric inverse problem to find the pollution emission source using a set of measurements from a sensor network, has attracted significant attention in the atmospheric science community.
One of the fundamental building blocks of the inverse model is the atmospheric dispersion process, which is modeled by an advectiondiffusion equation (Stockie, 2011). Deterministic approaches, adopted from the field of atmospheric data assimilation, have been used widely for the source inverse problem (Eckhardt et al., 2008; Issartel et al., 2007; MartinezCamara et al., 2014; Pudykiewicz, 1998)
. In the deterministic approaches, typically a partialdifferentialequation constrained optimization problem is solved to minimize a convex loss function,
e.g.,distance between the computational model prediction and the observations. Since the optimization formulation for an inverse model usually leads to underdetermined or illconditioned system, much research effort is focused on regularizing the solution. Recently, inverse models exploiting Bayesian inference have become popular
(Chow et al., 2008; Keats et al., 2007; Rajaona et al., 2015), due to the strength of the Bayesian methods in dealing with noisy and incomplete data. In Keats et al. (2007), the adjoint advectiondiffusion operator is used to reduce the computational cost. Hwang et al. (2019) proposed an efficient Bayesian source inversion model to estimate the twodimensional source function by exploiting the adjoint advectiondiffusion operator. More general approaches to mitigate the high computational cost have been proposed by either accelerating the convergence of a Monte Carlo simulation (Marzouk et al., 2007), constructing surrogate models (Li and Marzouk, 2014), or developing a lowdimensional representation (Lieberman et al., 2010; RoostaKhorasani et al., 2014).Most of the previous inverse models consider either estimating the magnitudes of the source at each computational grid points by combining a large volume of heterogenous data (de Foy et al., 2015; Hwang et al., 2018; Issartel et al., 2007), or finding the locations and magnitudes of one or a few point sources from a limited number of data (Keats et al., 2007; Marzouk et al., 2007). In this paper, we propose an inverse model based on a regularized optimization formulation to estimate a smooth source function from a small number of observations. First, we follow the conventional approach of approximating a smooth function by a set of Gaussian radial basis functions centered at the collocation points of a rectangular mesh system. Obviously, the solution of the inverse models is strongly dependent on the choice of the mesh system. To relax the dependency on the mesh system, we introduce a random mesh system, in which the choice of the mesh system is modeled as a random variable. A stochastic inverse model is formulated on this random mesh system and the generalized Polynomial Chaos expansion (gPC) (Xiu, 2007) is employed to tackle the stochastic inverse problem. It is shown that the stochastic formulation leads to a inverse model, in which the unknown smooth function is approximated by hierarchical basis functions. The inverse model has an advantage over the standard leastsquare inverse model, particularly when the number of data is limited, due to its capability of refinement (Karniadakis and Sherwin, 2005).
This paper is organized as follows. Section 2 describes a leastsquare formulation of the advectiondiffusion problem by using an adjoint operator. In section 3, we reformulate the deterministic leastsquare problem as a stochastic problem by using gPC. In section 4, a mixed  and regularization is introduced to exploit the hierarchical nature of the basis functions and an algorithm based on the alternating direction method of multipliers is presented to solve the optimization problem. The proposed inverse model is tested in section 5. Finally, the concluding remarks are given in section 6.
2 Leastsquare inverse model
2.1 Forward model
We consider the following advectiondiffusion problem,
(1) 
Here, is a rectangular domain , in which is the coordinate of the lower left corner of the domain and and are the lengths in the and directions, respectively, with the boundary , and
denotes an outward normal vector on
. The outflow and inflow boundaries are defined in terms of the fluid velocity as and . The fluid velocity is assumed to be given by a measurement or a computational fluid dynamics model. The advectiondiffusion operator is defined as(2) 
in which
is a symmetric secondorder tensor of the diffusivity. We assume that the elements of
and are smooth functions with uniformly bounded derivatives of all orders. The source strength is the unknown function, but assumed to be smooth. Furthermore, we consider a nonnegative source, i.e., for every .Contrary to the usual computational prediction problem, where equation (1) is solved for known to estimate at the sensor locations, the inverse model aims to estimate from the given observations at the observation time . Here, instead of estimating directly by solving an infinite dimensional optimization problem, is approximated by the sum of a set of basis functions to reduce the dimensionality of the problem. Let be a finitedimensional approximation,
(3) 
Here, is a set of basis functions, is the total number of the basis functions and denotes the coefficients. The problem of estimating a continuous surface is reduced to a problem of finding coefficients, . There are many possible choices for the basis functions as long as satisfies the nonnegativity condition: . Here, a set of Gaussian radial basis functions (GRBF) located at the collocation points of a rectangular mesh is used as the basis functions;
(4) 
in which is the distance between the neighboring collocation points, is the location of th collocation point and is an parameter.
The rectangular mesh system is defined by a tensor product of two onedimensional collocation sets,
(5) 
Here, and denote the sets of collocation points in the  and directions, respectively;
in which is the left and is the bottom end of the mesh system, and is the number of the collocation points in the direction, . An example of and GRBF is shown in figure 1. Obviously, in the limit of , converges to the Dirac delta function and, hence, converges uniformly to . Using GRBF, the nonnegativity condition can be satisfied by for .
Now, the advectiondiffusion equation can be written as
(6) 
Since is a linear operator, we can exploit the superposition of the solutions. Let be the solution for the th GRBF;
Then, clearly,
Moreover, from the linearity of ,
(7) 
Here, is the solution for with a unit strength, i.e., .
The computational model output is related to the observation by an inner product with respect to a sensor function () as
(8) 
in which is the observation at the th sensor, is the location of the th sensor, is the time of the measurement, is the total number of the sensors, and
is a Gaussian white noise representing the errors in the measurement as well as the computational model. The angle bracket denotes an inner product
The sensor function depends on the types of the sensor or the data used in the analysis. In this study, the sensor function is defined as
(9) 
Here is a Heaviside function, which is zero for and one otherwise, and is an timeaverage window of the sensor.
By comparing (7) and (8), can be related to the measurement by
(10) 
Here, and
(11) 
The dispersion matrix relates to the observation . Since can be computed by solving exactly the same partial differential equation (2) for each GRBF (), the same numerical solver can be recycled.
The source strength can be obtained by finding from the following leastsquare minimization problem
(12) 
in which is a regularization. Since we consider the case , is a rankdeficient matrix and a regularization is required to guarantee the uniqueness of the solution. We refer to (12) as a leastsquare (LS) inverse model.
2.2 Adjoint model
It is important to note that computing requires to solve the advectiondiffusion equation for times, which makes it computationally impractical as becomes large. Moreover, as will be discussed in section 3, when the model uncertainty is considered, the total number of computation easily blows up to . To circumvent these difficulties, an adjoint model is employed in this study. Reducing the number of repetitive computations from the number of GRBFs, , to the number of observations, , an adjoint model is computationally more tractable (Keats et al., 2007).
Here, the adjoint model is briefly described. Define a conjugate field () as
(13) 
Then, the adjoint operator is obtained from the Lagrangian duality relation;
(14) 
which gives
(15) 
for . The adjoint model (15) is solved backward in time from . For more details, see Pudykiewicz (1998).
Once the th conjugate field is computed by solving (15) with appropriate boundary conditions (Hourdin et al., 2006), at the th sensor is computed as
(16) 
Repeating the process for conjugate fields, the observation vector is
(17) 
in which
It is trivial to show that . The coefficients can be computed by solving the same leastsquare minimization problem (12).
3 Generalized polynomial chaos for model uncertainty
The Gaussian radial basis function, , distributes the source strength computed from (12) in the space centered on the collocation points of . Obviously, the solution of a LS inverse model depends on the choice of . For example, if a local peak of does not coincide with one of the collocation points, the LS inverse model will result in a poor accuracy. In general, there is no standard rule of choosing . In this study, we propose to represent the uncertainty in the choice of as a random variable.
Let
be a random variable with a uniform distribution;
(18) 
Here, and
are real random variables defined over a probability space
, in which is the sample space, is the algebra, is the probability measure, and is an element of . Both and are defined overwith the probability density functions,
, , i.e. for , 2. Note that corresponds to a random translation of , which uniformly covers the entire computational domain. In the absence of prior information on the source location, a natural choice would be to give an equal probability to every possible .On the random collocation system, , (17) becomes
(19) 
in which
and
Here, we aim to model
by the first moment of the stochastic system. Although it is possible to develop a model matching higher moments, it will lead to a complex nonconvex optimization problem. Taking an expectation over
, (19) becomes(20) 
Then, a leastsquare inverse model can be formulated as
(21) 
in which
Note that, since we consider only the first moment, the nonnegative condition is imposed only on the expectation of . Hereafter, the obvious dependence on is omitted in the expectation, i.e., .
Following Xiu (2007); Xiu and Karniadakis (2002), the generalized polynomial chaos (gPC) expansion is employed to approximate the stochastic functions;
(22)  
(23) 
Here, is an orthonormal polynomial basis in a bivariate polynomial space () constructed by a tensor product of onedimensional polynomial spaces
in which
As is a uniform random variable, the Legendre polynomial is chosen as the basis polynomial (Xiu and Karniadakis, 2002).
From the gPC approximation,
(24) 
in which , , and denotes the total number of the basis functions excluding the mean (zeroth order) component, . Here,
(25) 
And, the source function is
(26) 
The gPC mode of GRBF, , is
(27) 
The coefficients can be easily computed by using a numerical integration such as the Gaussian quadrature. For low order modes, can be even computed analytically. For example, the first mode of the expansion is
(28)  
Figure 2 shows the first four modes of . It is shown that constitutes spatial hierarchical basis functions to approximate . The advantage of using the hierarchical basis functions over increasing the number of GRBF () is discussed in section 4.
4 Regularized optimization formulation
Using the gPC expansion, the minimization problem (21) becomes,
(29) 
In the minimization problem, the number of parameters to estimate is . Note that, if we use the forward simulation approach shown in section 2.1, even for and , the total number of numerical simulations to compute becomes . On the other hand, using the adjoint model, the total number of the numerical simulations remains as , which is and can be computed efficiently by evaluating the inner product with a numerical integration.
In (29), should satisfy the nonnegativity constraint, for every . Because it is difficult to directly impose the nonnegativity condition for every , we propose an indirect constraint based on the stochastic collocation approximation (Xiu and Hesthaven, 2005). In the stochastic collocation method, the stochastic functions are approximated by the Lagrange polynomials as
(30)  
(31) 
Here, is the th Lagrange polynomial, which is for the th collocation points in , . Again, is constructed by the tensor product of two onedimensional Lagrange polynomials in and . The stochastic collocation method corresponds to a deterministic sampling and the coefficients are easily computed by the function evaluations at each collocation points in , i.e., and . Since GRBF is positivie, for all , the nonnegativity condition implies for . In other words, in the stochastic collocation approach, it is sufficient to impose the nonnegativity constraint only on the parameters, not on the field. The stochastic collocation coefficients, , are related to the modal gPC coefficients, , as
Then, the constraint on the smooth function surface for every can be approximated by the following linear constraint
(32) 
Or,
(33) 
in which is a block diagonal matrix for . In this study, we choose and (32) is evaluated at the Chebyshev node.
As an analogy to the finite element analysis (Karniadakis and Sherwin, 2005), either  or type refinement can be used to increase the resolution of the proposed inverse model. Let be the number of GRBFs for a reference case, i.e., the number of collocation points of . In the type refinement, the grid space is decreased as for , which makes the total number of unknown parameters . In the type refinement, is fixed and the maximum order of is increased, which results in for . Because of this quadratic dependence, in both  and type refinements, the number of unknown parameters can easily overwhelm the number of observations upon a refinement. For example, in the numerical experiments in section 5, the number of unknown parameters, i.e. the dimension of , is , while the number of data . As a result, the optimization formulation results in a highly illposed system, of which solution heavily relies on the choice of the regularization. To alleviate the difficulty, we develop a regularization strategy, which exploits the hierarchical nature of the GRBFgPC coefficients, ,
As shown in (28), the zeroth gPC mode of GRBF, , represents the average source strength in centered at . Because , the nonnegativity constraint implies that
(34) 
At the same time, when the mean source strength, , is zero, the variation around the mean, represented by the higherorder gPC modes, should also be zero, i.e.,
(35) 
Therefore, in the type refinement, the number of unknown parameters can be effectively reduced by identifying nonzero elements in .
From these observations, we propose the following mixed  and regularizations,
(36) 
in which . The Least Absolute Shrinkage and Selection Operator (LASSO), or regularization, is one of the most widely used regularization methods to find such a “sparse” solution for the socalled “large , small ” problem (large number of parameters and small number of data) (Tibshirani, 1996). As discussed above, for the type refinement, applying LASSO only for is enough to guarantee a sparse solution. Hence, LASSO is applied only to the zeroth mode, , and the higherorder terms are regularized by the standard Tikhonov regularization.
From the nonnegativity constraint, we know that . Then, the regularized optimization problem can be written as
(37)  
The first tuning parameter controls the sparsity in the solution, , and the second one prevents the overfitting by regularizing the total variation around the mean.
Furthermore, to consider the spatial smoothness of , the fused LASSO is used (Tibshirani et al., 2005). In the fused LASSO, norm is imposed on the difference between a directly connected parameters. For example, in the direction, the fused LASSO regularization is
(38) 
in which is an index set for directly connected in the direction, i.e.,
In other words, the fused LASSO is equivalent to imposing penalty in the gradient of . We can define the difference matrix in the direction, , similar to . Then, the regularization can be written as a mixed generalized LASSO (Tibshirani. and Taylor, 2011) and Tikhonov regularization,
(39) 
in which
The coefficient determines the relative weight of the standard LASSO to the fused LASSO.
Finally, the optimization problem for the type source reconstruction is
(40) 
Note that the coupling between the modes shown in (35) is not explicitly imposed in the regularization. However, the sparsity in the modal domain is implicitly imposed through the linear constraint. This optimization problem is solved by using the Alternating Direction Method of Multipliers (ADMM). ADMM is one of the most widely used method for a largescale optimization (Boyd et al., 2010).
Rewriting (40) as a constraint optimization problem,
(41) 
where , , , and . Note that
is padded with a null matrix
, because the generalized LASSO, , is applied only to . The subscript indicates a projection onto a positive set , i.e., . The augmented Lagrangian form is(42)  
in which is a constant. Then, the solution procedure for the minimization problem is as follows;

Set initial conditions for and other variables.
Here,
is an identity matrix of the dimension of
, whose first elements are zero, which imposes the Tikhonov regularization on . Then, 
For , update by

Next, update , i.e., and by
Here, is a vector of dimension , i.e., the number of GRBFs and the number of the quadrature points for the evaluation of the nonnegativity condition. The operator takes only the upper elements of the vector for the generalized LASSO and subsets the lower elements of the vector for the nonnegativity constraint.

Finally, update ,

Repeat steps 2 4, until the improvement
for a prespecified tolerance level .
The Tikhonov regularization is imposed in Step 2. In Step 3, the fused LASSO is imposed on and the nonnegativity constraint is imposed by a projection on the nonnegative set in .
5 Numerical experiments
We use numerical simulations to study the behavior of the inverse model and the results are compared with the standard leastsquare inverse models (12) with LASSO and fusedLASSO regularizations (Tibshirani et al., 2005; Tibshirani, 1996). The computational domain for this case study is . The adjoint equation (15) is numerically integrated for by using a thirdorder lowstorage RungeKutta method and an upwind finite volume method. The computational grid and time step sizes are chosen as and .
Figure 3 (a) shows the computational domain as well as the locations of the sensors. The total number of observations is . The sensors are located at a rectangular mesh of which grid size is . The time averaging window of the sensor in equation (9) is and the time of observation is . The computational parameters are chosen similar to real operational air pollution measurement conditions. In an operational air pollution measurement, usually a high frequency sensor measurement is averaged over one to three hours to remove measurement noise.
A few adjoint functions are shown in figure 3
(b). To mimic the atmospheric dispersion process, the wind field is generated by a Fourier transform of a set of OrnsteinUhlenbeck processes. The velocity in the
direction is(43) 
in which is the maximum number of the Fourier modes and is the wavenumber. The Fourier coefficients are computed by solving the following Langevin equation;
(44) 
in which is a relaxation timescale, S is a scale parameter, and denotes the Wiener process . In this study, and are used (Pope, 2000). The velocity in the direction is computed from the divergencefree condition;
(45) 
In other words,
(46) 
The mean components are also obtained by solving the same Langevin equation, but with
Comments
There are no comments yet.