# Ensemble-based implicit sampling for Bayesian inverse problems with non-Gaussian priors

In the paper, we develop an ensemble-based implicit sampling method for Bayesian inverse problems. For Bayesian inference, the iterative ensemble smoother (IES) and implicit sampling are integrated to obtain importance ensemble samples, which build an importance density. The proposed method shares a similar idea to importance sampling. IES is used to approximate mean and covariance of a posterior distribution. This provides the MAP point and the inverse of Hessian matrix, which are necessary to construct the implicit map in implicit sampling. The importance samples are generated by the implicit map and the corresponding weights are the ratio between the importance density and posterior density. In the proposed method, we use the ensemble samples of IES to find the optimization solution of likelihood function and the inverse of Hessian matrix. This approach avoids the explicit computation for Jacobian matrix and Hessian matrix, which are very computationally expensive in high dimension spaces. To treat non-Gaussian models, discrete cosine transform and Gaussian mixture model are used to characterize the non-Gaussian priors. The ensemble-based implicit sampling method is extended to the non-Gaussian priors for exploring the posterior of unknowns in inverse problems. The proposed method is used for each individual Gaussian model in the Gaussian mixture model. The proposed approach substantially improves the applicability of implicit sampling method. A few numerical examples are presented to demonstrate the efficacy of the proposed method with applications of inverse problems for subsurface flow problems and anomalous diffusion models in porous media.

## Authors

• 2 publications
• 1 publication
• ### Sequential Ensemble Transform for Bayesian Inverse Problems

We present the Sequential Ensemble Transform (SET) method, a new approac...
09/20/2019 ∙ by Aaron Myers, et al. ∙ 0

• ### Local estimators and Bayesian inverse problems with non-unique solutions

The Bayesian approach is effective for inverse problems. The posterior d...
05/19/2021 ∙ by Jiguang Sun, et al. ∙ 0

• ### Transport map accelerated adaptive importance sampling, and application to inverse problems arising from multiscale stochastic reaction networks

In many applications, Bayesian inverse problems can give rise to probabi...
01/31/2019 ∙ by Simon L. Cotter, et al. ∙ 0

• ### Scalable optimization-based sampling on function space

Optimization-based samplers provide an efficient and parallellizable app...
03/03/2019 ∙ by Zheng Wang, et al. ∙ 0

• ### Stabilizing Invertible Neural Networks Using Mixture Models

In this paper, we analyze the properties of invertible neural networks, ...
09/07/2020 ∙ by Paul Hagemann, et al. ∙ 0

• ### Consensus Based Sampling

We propose a novel method for sampling and optimization tasks based on a...
06/01/2021 ∙ by J. A. Carrillo, et al. ∙ 0

• ### Non-Stationary Multi-layered Gaussian Priors for Bayesian Inversion

06/28/2020 ∙ by Muhammad Emzir, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The model inputs, such as parameters, the source and system structure, are often unknown in practical models. For example, in porous media application, the practical models include subsurface flows and anomalous diffusion models [37]. The unknown model inputs can be identified by integrating noise observations and the prior information. The problem of identifying the unknown inputs in mathematical models have been intensively studied in the framework of inverse problems and many methods have been proposed [1, 30, 6]. The inverse problem can be generally considered as an optimization problem, which minimizes the misfit between simulated observations and true observations. The inverse problem is often ill-posed. To avoid the ill-posedness, incorporating the penalty into the objective function is often necessary [19]

. In the paper, we use Bayesian inference to solve the inverse problems. Bayesian inversion can provide not only the point estimate, but also the statistical distribution of unknowns and the prediction intervals of state variables.

In Bayesian inference, the posterior is often concentrated in a small portion of the entire prior support. To accelerate the posterior exploration, the unknown inputs can be estimated by the samples from a high probability region. As a sampling method for Bayesian inference, importance sampling generates samples from a probability density function (pdf), which is only up to a multiplicative constant. The importance sampling method based on Monte Carlo has been widely used in inverse problems

[32, 28]. It can be used as a method of sensitivity analysis, as an alternative of acceptance-rejection sampling and as the foundation to computing normalizing constants of probability densities. In dynamic systems, it is also an essential prerequisite for sequential Monte Carlo [13]. However, the importance sampling is different from the standard Monte Carlo method, which generates samples with equal weights. The weights of importance sampling are from the proposal density and unequal. To obtain effective weights, the selection of the proposal density is critical. The proposal density is also called the importance density.

In the paper, we construct the importance density through an implicit sampling (IS) method [11]

, which provides a data-informed importance density. The main idea of IS is to locate the high probability region and generate samples around the Maximum A Posteriori (MAP) point. In IS method, it is required to compute the MAP point of the posterior and the corresponding Hessian matrix for the negative logarithmic of the posterior. There are many techniques to estimate the MAP point, such as Markov chain Monte Carlo (MCMC) method

[23, 14], variational method [24], and ensemble-based method [21, 4]

. These methods are based on the Bayesian framework and provide a statistic analysis, which can give the prediction and credible intervals, pdf and other statistical information. The inverse of Hessian matrix can be approximated by the posterior covariance matrix. Then the importance weights can be obtained by solving an implicit equation. The effectiveness of IS for Bayesian inverse problems has been studied in

[10, 31, 3].

The ensemble-based method, such as ensemble Kalman filter (EnKF)

[17, 4] and ensemble smoother (ES) [6], was proposed for data assimilation. In recent years, the ensemble-based method has been used to forecast the state and estimate the unknown parameters. In the work, we use an ensemble-based method to find the MAP point and approximate the inverse of Hessian matrix by ensemble samples. EnKF is widely used in Bayesian data assimilation but brings the problem of inconsistency. Although ES has no inconsistency issue, it is a global update by assimilating all observations simultaneously and may perform poorly due to the single update. To improve ES method, Chen and Oliver proposed the iterative ensemble smoother (IES) [8], which can deal with the high-dimensional and nonlinear problems in [9]

. IES is still a Gaussian approximation, which can provide the first-order and second-order moments of the posterior distribution. This brings difficulty for using IES to non-Gaussian distribution.

To treat non-Gaussian models in the ensemble-based method, we can use two approaches: parameterization and non-parameterization. The parameterization approach uses a transform to gain the latent variables. The typical examples include truncated pluri-Gaussian (TPG) [2], level set [29], multiple-point simulation (MPS) [36] and discrete cosine transform (DCT) [22]. The latent parameters can be easy to update by the ensemble-based method. In a practical situation, we may only know the discretization of the physical domain for a random field with unknown covariance information. For this situation, DCT based on a Fourier-based transformation can parameterize the random field. It only depends on the discretization of the physical domain. DCT roots in the image processing and has been widely used for image compression. To construct DCT expansion, the cosine functions can be used to form a set of mutually orthogonal basis functions. By overcome the challenge of the possible high dimensional parameters in DCT, we can use a truncated DCT, where the low frequency basis functions are retained and the high frequency are abandoned. Truncated DCT can both capture the main features of the random field and improve the computation efficiency in Bayesian inversion.

Non-parameterization is another approach for handling the non-Gaussian priors. One of the non-parameterization methods is semi-parametric, such as the mixture of distributions, which adopts a convex combination of several distributions to approximate a posterior distribution. The common mixtures are Beta [5], triangular [33] and Gaussian [34, 27] mixture models. In the mixture model, each distribution can be seen a basis function. Thus, the goal is to find a set of optimal basis functions. The corresponding coefficients are the model weights, which imply the importance of distributions. In the paper, we use Gaussian mixture model (GMM) for Bayesian inversion with non-Gaussian priors. In particular, GMM can be coupled with EnKF to estimate the multimodality state distributions [27]

. When using GMM, the weight, mean and covariance of each Gaussian model become unknown parameters, which need to be estimated. The expectation-maximization (EM)

[7] method is available to forecast these GMM parameters. To obtain a good approximation, the number of models may be large enough. However, large will bring the singularity of covariance, which results from the unbounded logarithmic form of the mixture. To avoid the singularity issue, the Bayesian Ying Yang (BYY) harmony learning based on the general statistical learning framework has been proposed in [38]. A model selection criteria and automatic model selection method, called BYY harmony data smoothing learning model selection criterion (BYY-HDS) [20], can be derived from BYY harmony learning to estimate GMM parameters. Here, the can be unknown and selected by BYY-HDS when the initial value of is large enough.

The goal of this paper is to combine IES method with IS to develop an ensemble-based implicit sampling for Bayesian inverse problems. In the proposed method, we use IES to compute mean and covariance. The mean can be used as the MAP point and the covariance as the approximation of the inverse of Hessian matrix. Then an implicit map is given by the mean and the Cholesky factorization of the covariance. The implicit map gives the importance ensemble samples, where the corresponding weights are the ratio between the importance density and posterior density. Resampling based on these weights may be necessary to avoid the ensemble degeneration. Thus, IES provides the MAP point and the inverse of Hessian matrix, and then using IS generates the importance samples. For convenience, we refer the proposed method as IES-IS. In the paper, we apply IES-IS to non-Gaussian models based on DCT and GMM, which are used to handle the non-Gaussian priors. When using BYY-HDS based GMM method, ensemble samples will be the training data to forecast the mean, covariance and weight of GMM. IES-IS is performed for each Gaussian model in GMM. This substantially improves the applicability of the IES-IS method for Bayesian inference. For some complex structures in the target field, it may be not enough to capture the main features by the immediate estimation of the proposed method. To this end, we use the post-processing based on the regularization [18] to improve the connectivity of main features. In general, the penalty term can be given by a quadratic form, which can achieve the global minimum with the convex constraint in the closed interval.

The rest of the paper is organized as follows. We begin with the general framework of implicit sampling for Bayesian inverse problems. In Section 3, we focus on the non-Gaussian priors, which can be handled using DCT and BYY-HDS based GMM. Section 4 is devoted to developing IES-IS based on DCT and GMM. In Section 5, a few numerical examples are presented to illustrate the performance of the proposed method with applications of inverse problems for subsurface flow problems and anomalous diffusion models in porous media. In particular, we recover channel structures and fractures in porous media. Some conclusions and comments are made finally.

## 2 Ensemble-based IS for Bayesian inverse problems

We assume that a model problem is defined as

 U(u;a(x))=f(x),

where is a generic forward operator and describes the relation of the coefficient , state and source term . For the Bayesian inverse problem, and may be unknown and assume to be characterized by and , respectively, in a finite dimensional parameter space, where is the unknown parameter. Let be the forward operator mapping the model parameter to the observation space, i.e., . Then the observation model can be given by

 d=g(θ)+ε,

where is the observation noise. In the paper, we assume that is independent of and . From the observation model, we have the likelihood function

 (1)

Given a prior , the conditional posterior density function can be derived by Bayes rule

 p(θ|d)=p(θ)p(d|θ)∫p(θ)p(d|θ)dθ,

where is a constant independent of . Let . The goal of Bayesian inverse problem is to find a solution to minimizing , i.e.,

 ˆθ=argminθ∈RNθF(θ). (2)

Thus, is the MAP point of . In the Bayesian framework, can be approximated by the expectation of with respect to

 ˆθ≈E[θ]=∫θp(θ|d)dθ, (3)

where is the expectation operator.

### 2.1 Bayesian inference using importance sampling

We can use Monte Carlo method by drawing independent samples from to approximate the integral in equation (3). Monte Carlo integration is often used if we can sample from the target distribution. However, drawing samples from the target distribution is often difficult in practice. Thus, we want to seek an alternative distribution, which can be easy to sample. This motivates the importance sampling, where the idea is to draw samples from a proposal distribution and re-weight the integral using the importance weights such that a proper distribution is targeted. The proposal density is also called the importance density. The importance sampling can bring enormous gains, making an otherwise infeasible problem amenable to Monte Carlo.

When drawing samples from is infeasible, we need to find a importance density function to replace it. Then we have

 E[θ]=∫θp(θ|d)q(θ|d)q(θ|d)dθ=Eq[w(θ)θ],

where and denotes the expectation with respect to . Then we sample from instead of . Here the adjustment factor is called the likelihood ratio. Thus the importance sampling estimate of (3) is given by

 ˆEq[w(θ)θ]=1NeNe∑i=1w(θi)θi,

where is drawn from . In Bayesian inference, is only up to a normalizing constant, i.e., , where can be obtained but is unknown. For this case, we compute the ratio estimate

 ˜Eq[w(θ)θ]=∑Nei=1w(θi)θi∑Nei=1w(θi)

instead of . Thus constructing the effective importance density function is critical in Bayesian inverse problem.

### 2.2 Implicit sampling method

Implicit sampling [11] generates samples by an implicit map. It can construct the importance probability density function and ensure the efficacy of importance sampling. To implement IS, we first need to compute the MAP point of and Hessian matrix of , and then generate samples from the high probability region of the posterior density. Assume that the minimum of exists. Let

 φF=minFand˜μ=argminF.

We first find the high probability region of posterior density function by minimizing . Then we generate samples around .

If is nonlinear, the posterior density function may be non-Gaussian, even though the prior is Gaussian. Thus sampling from

may be difficulty. In IS, we choose a reference random variable

with probability density function , which is easy to sample (the reference random variable is Gaussian in the paper). We assume that the minimum of exists and . To generate the samples of , we draw samples from and then solve the following implicit equation

 F(θ)−φF=B(ξ)−φB. (4)

The implicit equation can be solved by many approaches, such as random map [31], linear map [10] and the connection with optimal map [16]. For the different methods, the resulting samples may have different weights.

In the paper, we focus on the linear map, where is approximated by

 F(θ)≈F0(θ):=φF+12(θ−˜μ)TH(θ−˜μ), (5)

where is the Hessian matrix at . If we take , where

is the identity matrix. Then

and . By equation (4) and (5), we get the approximated implicit equation

 F0(θ)−φF=12ξTξ. (6)

Let be the Cholesky factorization of . Then

 θ=˜μ+Lξ

solves the equation (6). The corresponding weight of sample is

 w∝exp(F0(θ)−F(θ)).

### 2.3 IS based on the iterative ensemble smoother

Iterative ensemble smoother (IES) was proposed in [8], which is an ensemble method for Bayesian inverse problems. One of the IES methods is the modified Levenberg-Marquart method for ensemble randomized maximum likelihood (LM-EnRML) [9]. For LM-EnRML method, a modification is made to approximate the inverse of Hessian matrix such that the explicit computation of the Jacobian matrix of is avoided. We couple IES with IS to develop the ensemble-based implicit sampling method, where IES is used to obtain the MAP point of and the inverse of Hessian matrix of , and IS is used to generate high probability samples.

From equation (1

), the likelihood function belongs to the exponential family. Thus we use a mixture of natural conjugate priors to approximate any prior

[12]. In the paper, we use Gaussian density functions as the natural conjugate prior. Then we have

 p(θ)=k∑i=1πip(θ|θpr,i,Cθ,i),πi>0andk∑i=1πi=1,

where is a Gaussian density function with the mean and covariance , and is the mixing probability. Thus the posterior density function

 p(θ|d)∝(k∑i=1πip(θ|θpr,i,Cθ,i))p(d|θ).

Let , i.e., . By the convex combination of posterior density function,

 Ep[θ]=k∑i=1πiEpi[θ].

To compute the expectation of , we need to find the posterior density function . For the ensemble method, can be constructed by ensemble samples.

Let

 Fi(θ) = −log(p(θ|θpr,i,Cθ,i)p(d|θ)) = 12(d−g(θ))TC−1D(d−g(θ))+12(θ−θpr,i)TC−1θ,i(θ−θpr,i)+c,

where the forward operator and is the covariance matrix of observation error, and is a constant independent of . The ensemble samples can be considered as the minimizer of . We note that minimizing is equivalent to minimizing the following function

 Wi(θ)=12(d−g(θ))TC−1D(d−g(θ))+12(θ−θpr,i)TC−1θ,i(θ−θpr,i),i=1,⋯,k. (7)

For most practical applications, is nonlinear. Minimizing equation (7) with all observations simultaneously is called ensemble smoother, which is just one-step iteration and inaccurate for the high-dimensional or nonlinear problems. Thus we devote to using the iterative scheme, which is called IES method.

Let be the Jacobian matrix at at -th iteration step of model . For Gauss-Newton method, the gradient of equation (7) can be expressed as

 ∇θWi(θl)≈[C−1θ,i(θl−θpr,i)+GTlC−1D(g(θl)−d)]

and Hessian matrix . To avoid the influence of large data mismatch in early iterations and accelerate the convergence, we modify the Hessian matrix by the Levenberg-Marquart method [9]. Besides, two further modifications in [25] are necessary to implement the iterative update formula. Let be the Jacobian matrix at the ensemble mean , where and is the ensemble size. We use to replace in Hessian matrix and to replace . For IS based on IES method, we have the iterative scheme

 ˜θjl+1= θjl−((1+λ)C−1θl+¯¯¯¯GTlC−1D¯¯¯¯Gl)−1[C−1θ,i(θjl−θpr,i)+¯¯¯¯GTlC−1D(g(θjl)−d)] = −Cθl¯¯¯¯GTl((1+λ)CD+¯¯¯¯GlCθl¯¯¯¯GTl)−1(g(θjl)−d).

Then the Kalman gain

 Kl=Cθl¯¯¯¯GTl((1+λ)CD+¯¯¯¯GlCθl¯¯¯¯GTl)−1. (8)

The intermediate ensemble samples are generated by

 ˜θjl+1=θjl−11+λ(Cθl−Kl¯¯¯¯GlCθl)C−1θ,i(θjl−θpr,i)−Kl(g(θjl)−d),j=1,⋯,Ne. (9)

For IS method, the inverse of modified Hessian matrix

 ˜H−1i≈11+λ(Cθl−Kl¯¯¯¯GlCθl), (10)

and the MAP point

 ˜μi≈1NeNe∑j=1˜θjl+1. (11)

Assume that the minimum of exists and . Let and . We apply equation (5) and (6) to and have

 Wi(θ)≈Wi0(θ)=ϕWi+12ξTξ. (12)

The importance ensemble samples can be obtained by solving equation (12), i.e.,

 θj,il+1=˜μi+Liξj (13)

with the weights

 wij∝exp(Wi0(θj,il+1)−Wi(θj,il+1)),j=1,⋯,Ne,i=1,⋯,k,

where is the Cholesky factorization of . For the discrete structure, the weights may approach infinity because of the exponential growth. To ensure a proper size of the effective samples, we can scale the difference , where the order of weights retains unchanged. This does not affect the solution of the implicit equation. Thus the importance samples can be generated by equation (13) with the modified weights

 (14)

The selection of the scale parameter is carefully discussed in [35]. To avoid ensemble collapse, we do resampling and redistribute the weights. For each model , we obtain the importance ensemble samples . In the end, we combine ensembles using a membership probability matrix to get the update ensemble .

## 3 Priors based on DCT and GMM

Ensemble-based method may not work well for the problems with non-Gaussian priors. In order to overcome the difficulty, we use suitable parameterization methods to characterize the non-Gaussian field. In the paper, we focus on DCT and GMM to treat the non-Gaussian priors. We apply the proposed IES-IS method to the priors described by DCT and GMM.

### 3.1 Parameterization based on DCT

The goal of parameterization methods is to obtain the latent variables by a transform, which can be updated by the ensemble-based method. The widely used parameterization methods are truncated pluri-Gaussian (TPG), level set, multiple-point simulation (MPS) and discrete cosine transform (DCT). To get the unknown parameter , these parameterizations are used to the unknown input. In the paper, we focus on DCT [22], which is a Fourier-based transformation and can extract the important features of a random field in the Bayesian inverse problem. As known, paramerization by KLE needs the mean and covariance information of the random field. But for DCT method, the physical domain discretization of the random field is enough to construct the basis functions. Besides, the separability of DCT basis makes the efficient computation of basis functions.

Without loss of generality, we consider the unknown function defined in a two dimensional spatial domain, and uniform grid is used to discretize the function. Then can be expressed as . Thus the general forward DCT of the input field has the form

 θNm(i,j)=2α(i)α(j)√NxNyNx−1∑m=0Ny−1∑n=0A(m,n)cos[π(2m+1)i2Nx]cos[π(2n+1)j2Ny],

where , and

 α(i)={1√2i=0,1otherwise.

In the inversion problem, we get through estimating the unknown . Thus the inverse 2D DCT is necessary for the inverse problem, as shown below

 A(m,n)=2√NxNyNx−1∑i=0Ny−1∑j=0α(i)α(j)θNm(i,j)cos[π(2m+1)i2Nx]cos[π(2n+1)j2Ny], (15)

where , .

Due to the separability property of DCT basis functions, (15) can be written as

 A(m,n)=Nx−1∑i=0√2Nxα(i){Ny−1∑j=0√2Nyα(j)θNm(i,j)cos[π(2n+1)j2Ny]}cos[π(2m+1)i2Nx]. (16)

We can implement DCT by a vector form, i.e.,

and can be represented as the vectors. Thus and . Let denote basis function matrix with respect to and denote a set of natural numbers given by

 INm={i∗Ny+j},i∈{0,⋯,Nx−1},j∈{0,⋯,Ny−1},

where all the possible combinations for and are taken into account. The th column of can be expressed as

 ϕr=2α(i)α(j)√Nm⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣cos[π(2×0+1)i2Nx]cos[π(2×0+1)j2Ny]⋮cos[π(2×(Nx−1)+1)i2Nx]cos[π(2×(Ny−1)+1)j2Ny]⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦∈RNm. (17)

Then (16) becomes

 A=ΦNmθNm=[ϕ0,ϕ1,⋯,ϕNy−1,⋯,ϕNm−1]θNm=Nm−1∑r=0θrϕr,

where and the subscript corresponding to a pair of indices .

From equation (17), we note that can be pre-computed and data-independent. Thus DCT basis functions only need to be calculated and stored once. We choose the low frequency basis functions in to retain the main features of a random field. Then we reduce the dimension of unknown parameters without losing the main features. The selection of basis functions makes the low frequency basis functions retained and discards the high frequency basis funtions. The truncated DCT expansion by the first terms can be represented by

 A=Nc−1∑r=0θrϕr=Φθ, (18)

where is the first columns of and . Using equation (18), the IES-IS can be performed in the low-dimensional stochastic subspace.

### 3.2 Nonparametric method based on GMM

Compared with the parameterizations, semi-parametric is a different approach. The typical method is to adopt the mixture models to approximate the unknown distribution. We assume that a mixture of distributions can be described as any convex combination of other distributions ,

 k∑i=1πiPi(θ),

where and , and are from a parametric family. For an unknown distribution, there is a trade-off between the perfect representation of the unknown distribution and the useful estimation of the mixture. To obtain a good approximation of the distribution, may be large enough. The mixture models can be considered as using a few basis distributions to approximate the unknown distribution.

Many mixtures have been applied to the Bayesian inverse problems, such as Beta, triangular and Gaussian mixture models. In the paper, we focus on Gaussian mixture model (GMM). We assume be a Gaussian mixture density function consist of a convex combination of Gaussian density functions, i.e.,

 L(θ)=k∑i=1πip(θ|μi,Σi).

Here is a Gaussian density function with the mean and covariance . GMM parameters are unknown, which need to be identified in the Bayesian inverse problems. Let . When the estimation of obtained, we have the probability density function with respect to . The expectation of can be computed using the convexity

 E[θ]=k∑i=1πiμi.

To get an accurate estimation, we can update by an iteration process. The widely used method is the Expectation-maximization (EM) for estimating GMM parameters with known . For sufficiently large , the covariance matrices may be singular in EM algorithm. This is an inherent problem that the logarithmic form of is unbounded. For estimating GMM parameters with unknown , the Bayesian Ying Yang harmony data smoothing (BYY-HDS) learning model selection criterion is proposed in [20]. To get a proper approximation, BYY-HDS method can automatically screen models by minimizing the function

 JBYY−HDS(qhk,k)=k∑i=1πi(12log|Σi|+12h2tr[Σ−1i]−logπi), (19)

where and denotes the trace operator of the matrix. Due to unfixed, denotes the parameters of models at the current iteration. For convenience, EM method using BYY-HDS learning model selection criterion is called the smoothed EM (SmEM).

In the paper, we devote to using SmEM method to ensemble samples instead of the observation data, which is proposed in [27]. For SmEM algorithm, we first need to set the maximum and minimum of , where the maximum is large enough to automatically screen the mixture model and minimum is larger than 1. At the -th iteration step, we select and discard the models corresponding to the smaller weights . Thus may decrease with respect to the iterations. We update GMM parameters, which can be considered as E-step and M-step. E-step of SmEM can be expressed as

 γi,j=πip(θj|μi,Σi)∑ki=1πip(θj|μi,Σi),i=1,⋯,k,j=1,⋯,Ne. (20)

When we get the samples probability matrix , M-step in SmEM can be represented by

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩πi=1NeNe∑j=1γi,j,μi=1NeπiNe∑j=1γi,jθj,Σi=1NeπiNe∑j=1γi,j(θj−μi)(θj−μi)T+h2lI,i=1,⋯,k,j=1,⋯,Ne, (21)

where is the smoothing parameter at the -th iteration step. By equation (21), we note that is modified to avoid the singularity, which often occurs in EM method. The is critical for SmEM. We use an iteration scheme to update the smoothing parameter , i.e.,

 hl+1=hl+ηS(hl), (22)

where is the step length constant and

 S(hl)=Nθhl−hlk∑i=1πitr% [Σ−1i]−∑Nei=1∑Nej=1βi,j∥θi−θj∥2h3l

with

 βi,j=exp(−0.5∥θi−θj∥2h2l)∑Nei=1∑Nej=1exp(−0.5∥θi−θj∥2h2l).

The detailed procedure is presented in Algorithm 1.

### 3.3 Post-processing for discrete structure

For discrete structures, the immediate results by above method may be not good enough. To this end, we use a post-processing based on regularization to improve the connectivity of the important features. It is implemented in a block-by-block manner, so we perform the post-processing for each gridblock. Let denote the value of the -th gridblock. The goal of post-processing is to minimize the following function with respect to ,

 G(A)=(˜Ai−A)2+τT(A),A∈[Al,Au],0<τ<1,i=1,⋯,Nm, (23)

where denotes the regularization term, the corresponding regularization weight and is the domain of definition. We need to find the minimizer of , which is the estimation for the -th gridblock. For equation (23), the penalty term is applied to penalize values away from or . Thus, the selection of the regularization term is important. In practice, we often use the following quadratic form for ,

 T(A)=(b1A−b2)(b3−b4A),

where depends on and . Then, can be expressed as a quadratic form, i.e.,

 G(A)=(1−τb1b4)(A−2˜Ai−τ(b1b3+b2b4)2(1−τb1b4))2+c,

where is a constant independent of . To obtain the global minimum, we impose the constraint For the minimizer of the convex function in each gridblock, we note that there exist three cases:

 (i) If 2˜Ai−τ(b1b3+b2b4)2(1−τb1b4)Au,ˆAi=Au,

where . Then, is the global minimum in the -th gridblock and is the estimation in the -th gridblock.

## 4 IES-IS for non-Gaussian priors

In this section, we present IES-IS for Bayesian inversion with priors described by DCT and GMM. IES is a sampling method, which can generate ensemble samples to efficiently estimate the MAP point. We note that IES is a Gaussian approximation. To improve the effectiveness of ensemble samples, we use IS method to get a data-informed importance function. IS does not depend on any Gaussian assumption and is an importance sampling method, which can find the samples with high probability. The importance samples can be generated by the implicit equation, where the MAP point of and Hessian matrix of are necessary. To avoid the computation of Jcoby matrix, we use the ensemble mean as the MAP point and approximate the inverse of Hessian matrix by Monte Carlo method. We perform a resampling method to avoid the ensemble degeneracy. The proposed IES-IS method is used to deal with the non-Gaussian Bayesian inverse problems through using DCT and SmEM-based GMM.

### 4.1 IES-IS based on DCT

In this section, we use DCT to parameterize the unknown function and obtain the prior information. Each column of the basis function matrix is given by (17). To perform the proposed algorithm efficiently, we use the truncated DCT expansion in (18). The target field is parameterized by through DCT. Besides, the post-processing is applied to improve the connectivity of the inversion field. To further improve the efficiency, we use a criterion to reduce the dimension of against the iterations.

Let the prior be Gaussian. Then this corresponds to the case of in GMM described in Subsection 2.3. At the -th iteration step, we use the Monte Carlo method in [26] to approximate the covariance matrix. Then

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩Cθl≈1NeNe∑j=1(θjl−¯θl)(θjl−¯θl)TCθlDl≈Cθl¯GTl≈1NeNe∑j=1(θjl−¯θl)(g(ˆajl)−g(¯al))TCDlDl≈¯¯¯¯GlCθl¯¯¯¯GTl≈1NeNe∑j=1(g(ˆajl)−g(¯al))(g(ˆajl)−g(¯al))T, (24)

where and . By substituting (24) into (8), (9) and (10), we obtain the intermediate ensemble samples

 ˜θjl+1=θjl−11+λ(Cθl−KlCTθlDl)C−1θ(θjl−θpr)−Kl(g(ˆajl)−d),j=1,⋯,Ne.