On the Incorporation of Box-Constraints for Ensemble Kalman Inversion

08/02/2019 ∙ by Neil K. Chada, et al. ∙ National University of Singapore University of Mannheim 0

The Bayesian approach to inverse problems is widely used in practice to infer unknown parameters from noisy observations. In this framework, the ensemble Kalman inversion has been successfully applied for the quantification of uncertainties in various areas of applications. In recent years, a complete analysis of the method has been developed for linear inverse problems adopting an optimization viewpoint. However, many applications require the incorporation of additional constraints on the parameters, e.g. arising due to physical constraints. We propose a new variant of the ensemble Kalman inversion to include box constraints on the unknown parameters motivated by the theory of projected preconditioned gradient flows. Based on the continuous time limit of the constrained ensemble Kalman inversion, we discuss a complete convergence analysis for linear forward problems. We adopt techniques from filtering which are crucial in order to improve the performance and establish a correct descent, such as variance inflation. These benefits are highlighted through a number of numerical examples on various inverse problems based on partial differential equations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Inverse problems aim to recover a quantity of interest from perturbed noisy measurements. Traditionally inverse problems were solved by approximating in a least-squares manner, where the solution would correspond to the minimizer of a function [14]

. Another more recent approach for solving inverse problems is to take a statistical approach where the quantity of interest is a probabilistic distribution constructed via Bayes’ Theorem, known as the Bayesian approach, see e.g.

[19, 35]

. The advantage of quantifying inverse problems through the latter is that it aids in quantifying uncertainty through statistical properties. Common algorithms designed for Bayesian inversion include sampling based methods such as Markov chain Monte Carlo (MCMC) methods but also variational Bayes’ methods. One recent method aimed at solving Bayesian inverse problems through data assimilation methodologies

[24, 29] is ensemble Kalman inversion (EKI).

EKI can be viewed as the application of the ensemble Kalman filter (EnKF)

[16, 15, 21, 25] towards inverse problems. It was first proposed by Iglesias et al. [18, 26] as a motivation for derivative-free inverse problems, offering a cheaper approximation of the solution compared to traditional methods. EKI provides a unique perspective as despite aimed at Bayesian inversion, its methodology is not based on sampling like MCMC, but instead a variational manner. Since its formulation a number of research directions have been considered such as applications, building theory and applying uncertainty quantification techniques [8, 9, 11, 17, 31]. However, the basic version of the EKI does not allow to incorporate additional constraints on the parameters, which often arise in many applications due to additional knowledge on the system. We will focus in the following on the efficient incorporation of box constraints. An example of this type of constrains includes the case of hierarchical EKI [11]

where the hyperparameters are often defined through a uniform distribution. It is well known that the EKI may lead to estimates of the unknown parameters, which are unfeasible, i.e. the estimates do not satisfy the box constraints imposed by the uniform distribution on the hierarchical parameters. As a result there is a strong motivation to study the incorporation of constraints for the EKI.

Introducing constrained optimization for the EnKF has been a challenge, which has drawn increasingly more attention in recent years. A literature overview of existing methods for the treatment of linear and nonlinear constraints for Kalman-based methods can be found in [1, 34]. The projection of the estimates to the feasible set is a very common approach, see e.g. [20, 37], which can be generalized to nonlinear constraints by linearization ideas. Most of the variants are motivated by interpreting the Kalman-based updates as a solution of a corresponding optimization problem, see [1] for more details. This viewpoint allows to straightforwardly include constraints on the parameters and states. In [13], the authors suggest a new approach to handle linear equality and inequality constraints for the EnKF and EKI by reparameterizing the solution of the optimization problem in the range of the covariance, i.e. by seeking the solution of the optimization problem in a subspace defined by the initial ensemble. In this work, we will adopt the optimization viewpoint, i.e. we will view the EKI as a derivative-free optimizer for the least-squares misfit functional (cp. [31]), and motivate the incorporation of additional constraints on the unknown parameters via projection-based methods to the feasible set. To illustrate the idea, we focus in the following on the incorporation of box-constraints, i.e. we will introduce a variant of the EKI ensuring that the estimate of the unknown parameter remains within a box of the parameter domain. Our work will build on the theory by Bertsekas [6, 7] and others [32, 33] for projected preconditioned gradient methods. For inverse problems, box-constrained optimization has been applied but in a different context and for different applications, we refer the reader to [28].

We will show that a simple projection of the EKI estimate to the feasible set will not necessarily lead to a convergent scheme. To accommodate this we propose using techniques from data assimilation such as variance inflation [2, 3, 22]

to ensure that one can attain the correct descent direction. From this our analysis will consists of an existence and uniqueness result, a quantification of the ensemble collapse and the convergence of the residuals. We aim to compare this modified projected method to the simple projection of its continuos-time limit. The limit leads to an ordinary differential equation with discontinuous right hand side. The presented theory is based on a smoothened version of the limit, borrowing ideas from barrier methods.

In addition to the EKI based on perturbed observations, we will also consider the ensemble square root filter (ESRF) applied to inverse problems. The ESRF [23, 27, 36] is a modification of the EnKF, but with the key difference of being deterministic as there is no inclusion of the perturbed observations. We emphasize our work will be primarily focused on the EnKF for inverse problems. We will make note of this throughout the paper whether various results can be generalized to include the ESRF. In order to justify the proposed projected EKI, we provide numerical examples demonstrating the improvements, both for the original projected EKI but also the modified projected EKI with variance inflation.

The main contributions of our work can be summarized as follows:

  • We propose a new variant of the EKI and the ESRF for inverse problems which allows to incorporate box constraints on the unknown parameters. The modification of the original EKI is motivated by viewing the algorithm as a preconditioned gradient-like flow. The scheme introduces a tailored variance inflation to ensure descent directions with respect to the misfit functional in each iteration.

  • We derive a complete convergence analysis for the EKI with perturbed observations and ESRF for inverse problems based on the continuous time limit of the suggested variant in the case of a linear forward problem.

  • The validity of the results obtained are highlighted through numerical examples. All examples will be based on partial differential equations (PDEs) with linear and nonlinear forward operators.

1.1. Outline

The article is structured as follows: in Section 2 we provide an introduction to box-constrained optimization. We present this in a general sense before discussing its application to EKI, which we also review. Section 3 is dedicated to the derivation of the projected continuous time limit and gradient flow structure. Convergence and accuracy results will be presented in the linear setting, where we also introduce the notion of variance inflation. Section 4 is devoted to numerical experiments on PDE-constrained inverse problems. The purpose of this section is to verify the theoretical findings on both linear and non-linear problems. Finally in Section 5 we conclude with a number of remarks while providing avenues of future work.

2. Background on EKI and box-constrained optimization

The goal of computation for this work is to infer the unknown parameters from noisy data of the form

(2.1)

where is our parameter-to-observation mapping and denotes the number of observations. For simplicity, we will assume that the parameter space is finite dimensional, i.e. . Most of the results presented in the following sections can be straightforwardly generalized to the infinite-dimensional setting under suitable assumptions on the forward operator . To avoid the technicalities arising from the analysis of infinite-dimensional differential equations and optimization problems, we will work under the assumption that the parameter space is finite dimensional (possibly after applying a finite-dimensional approximation of the unknown quantity ).

2.1. Continuous time limit of the EKI

We briefly describe the EKI algorithm below and refer to [16, 18, 31] for more details on the derivation of the method. EKI works by updating an initial ensemble of particles through a two-step procedure. The EKI estimation of (2.1) is given by the usual EnKF update formula, also known as the analysis step

(2.2)

where we have the inclusion of perturbed observations

(2.3)

The update formula depends on empirical means and covariances defined as

where

denotes the tensor product for Hilbert spaces, given by

where & denote the Hilbert spaces and their inner products, such that and . The convergence analysis will be based on the analysis of the continuous time limit for projected EKI in the linear case. This was derived for the original EKI by Schillings et al. [31], where the limit is given by

(2.4)

with and initial condition

(2.5)

Here denotes the weighted Euclidean inner product in . Note that the limit is derived by neglecting the perturbations of the observations in each iteration, see [31] for more details. The analysis of the stochastic differential equation corresponding to the limit of (2.2) including the perturbed observations will be subject to future work.

In the case of a linear forward problem, the limit can be equivalently viewed as a preconditioned gradient flow

with denoting the weighted Euclidean norm in , misfit functional and empirical covariance .

2.2. Continuous time limit of the ESRF for inverse problems

Using a deterministic transformation of the ensemble satisfying the Kalman equations leads to the ESRF for inverse problems, see [30]. Based on the results from Reich et al. [4, 5], the continuous time limit of the ESRF for inverse problems is given by

(2.6)

with and initial condition

and its gradient flow structure

(2.7)

in the linear setting.

Both variants of the ensemble Kalman inversion satisfy the well-known subspace property, which states that the estimate will be a linear combination of the inital ensemble members.

Lemma 2.1.

[31] Let be the linear span of the initial ensemble , i.e. , then we have that for all and satisfying (2.4) and (2.7), respectively.

The preconditioned gradient flow structure viewpoint opens up the perspective to include additional constraints such as box constraints or more general, nonlinear constraints. The remaining section is devoted to the introduction of the box-constrained optimization. We will focus in the following on the linear case, i.e., assuming that with to illustrate the basic ideas. Then, the optimization problem consists of a linear least-squares problem, i.e. . We will modify the differential equations (2.4) and (2.7), respectively, to include the constraints using the ideas introduced below. We will see that we can interpret the suggested modifications as a variance inflation technique. In particular, this viewpoint implies that the subspace property will not hold anymore for the modified versions of the EKI.

2.3. The linear box-constrained optimization problem

Motivated by the optimization perspective on the EKI, we consider the following optimization problem: The objective function consists of the least-squares function with , where . We further define the set of the constraints by

(2.8)

for , , . We set , i.e. the feasible set is given by . The constrained optimization problem is then given by

(2.9)

Note that the optimization problem (2.9) is convex, which implies that necessary optimality conditions are also sufficient [7]. Let be a Karush Kuhn Tucker (KKT)-point for (2.9), i.e. there exists such that

  • for all ,

  • for all ,

  • ,

then is a global minimizer of (2.9). Box constraints are a special instance of the feasible set , i.e. , where is the

-th unit vector, and

are the lower or upper bounds on .

Remark 2.2.

In case is symmetric and positive definite, the optimization problem (2.9) is strictly convex. In particular, there exists at most one stationary point which is the global minimum for the problem (2.9) (cp. [7], Proposition 2.1.1). However, please note that the assumption on the regularity of is in general not satisfied for inverse problems due to the ill-conditioning of the problem.

2.3.1. The projected gradient method for box-constraints

We will shortly discuss the projected gradient method to numerically solve (2.9) and refer the reader e.g. to the work of Bertsekas [7] for a more detailed description of the method (in the discrete time setting).

Let denote a box. We can define a general projection as mapping the space of the ensemble to specific box. Since we will fix the box for the following part, we will write instead of . We define the projection componentwise as

The projected gradient method with step size is based on the iteration

(2.10)

One can derive a continuous time by considering going to zero. By using directional derivatives we obtain

(2.11)

More details can be found for the continuous time limit of the projected EKI in Section 3.1.

Remark 2.3.

Since the right hand side (RHS) of (2.11) is discontinuous, it is not obvious that a solution to this system exists. To ensure unique existence we consider a smoothed version of (2.11) by approximating the limit by ideas inspired from barrier methods. We introduce the parametrized, convex optimization problems

(2.12)

with parameter and inequality constraints and . As , the log barrier functions become closer to the indicator function of the feasible set of the original problem. We equivalently consider the problems

(2.13)

in the following, where we define . We will approximate (2.11) for by

Theorem 2.4.

Let and denote the solution of the smoothed initial value problem of (2.11) with .

Further assume that is positive definite, and there exists a (unique global) minimizer of (2.13). Then for each it holds true that

i.e. the solution converges to the (global) minimizer of (2.13).

Proof.

We define and prove that is a strict Lyapunov-function by the strict convexity of the optimization problem. The flow of satisfies

thus, the claim follows. ∎

Remark 2.5.

By duality arguments (see [10, 11.2] for more details), the accuracy of the approximation can be bounded by

where denotes the minimizer of the original problem (2.9). In particular, for and thus .

Remark 2.6.

Formally, we obtain in the limit the same result stated in Theorem 2.4 for being a solution of the initial value problem (2.11) with : We define with

and the index sets

Since is a feasible point, we get for and for and we obtain

where we have used the strict convexity of and optimality of and we conclude formally with .

Corollary 2.7.

Let and denote the solution of the smoothed initial value problem of (2.11) with .

Further assume that there exists a (global) minimizer of (2.9). Then it holds true that

where is a KKT-point of (2.13).

Proof.

Let be an arbitrary KKT-point of (2.13). The flow in the observation space is given by

which corresponds to the gradient flow of a strictly convex optimization problem in the observation space. Thus, by the same arguments as before in Theorem 2.4, the claim follows. ∎

Remark 2.8.

Formally, we again obtain in the limit the same results stated in Corollary 2.7 for being a solution of (2.11): Let be an arbitrary KKT-point of (2.9). Using

where denotes the Frobenius norm, we have

and obtain the integral inequality

Application of Gronwall’s integral inequality yields

2.3.2. The preconditioned projected gradient method

We consider the preconditioned version of the iteration (2.10) in discrete time given by

(2.14)

where is a symmetric, positive definite matrix. It is well known that arbitrary choice of gives no descent for any choice of . We will briefly discuss an example demonstrating that the preconditioning of the gradient flow does not lead to a descent direction in general (cp. [6]). This is highlighted in Figure 1.

Figure 1. Varying contour lines of the function defined in Example 2.9, with both the preconditioned descent direction in the unconstrained case and the projected preconditioned descent direction.
Example 2.9.

We consider the 2-dimensional quadratic example: We define the quadratic function

and consider the minimization problem of with the constraints . The gradient of is given by

The current iterate is given by . As a preconditioner, we consider the symmetric, positive definite symmetric matrix

For , the next iteration is given by

where is the projection onto . Then,

i.e. for all the function value of the objective function increases.

Example 2.9 shows that a simple projection strategy for the EKI, which is a preconditioned gradient flow in the linear case, does not lead to a convergent descent method in general.

To ensure a descent direction for the preconditioned projected gradient method, we follow the approach of [6, Proposition 1], which suggests to use matrices diagonal with respect to the subset of indices containing

(2.15)

We will give more details on the modification of the preconditioner in the context of the EKI in the following. In particular, we will make use of data assimilation methods such as variance inflation to ensure a descent. This will be discussed in greater detail in Section 3.

3. Convergence analysis

In this section we introduce a variant of the projected EKI and derive the continuous limit of the algorithm. We will provide a complete convergence analysis of the proposed modification in the linear setting. This will include an accuracy result of the proposed algorithm, the analysis of the ensemble collapse and the convergence to the truth.

Recall that the componentwise projection onto the box is denoted by . To incorporate the projection into our scheme we modify the procedure described in Section 2. We now define our prediction step in its projected form as

and similarly for the covariances

Then by using these estimates we can construct our update formula in its closed form which is

(3.1)

which is an almost identical copy of the update formula for the non-projected EnKF.

Now in order to test the projection we aim to derive a continuous time limit.

3.1. Continuous time limit

We now consider deriving a continuous time limit for the projected EnKF for inverse problems. The form of the limit will differ to that described in Section 1 , as we wish to quantify the limit in terms of directional derivatives. We begin be recalling the equations in closed form, given by the component-wise increments

By using the Neumann expansion for part of the Kalman gain, we observe for positive-semidefinite that

(3.2)

By defining , using (3.2) and by using the definition of directional derivatives, we obtain

(3.3)

Finally, we obtain the continuous time limit for the EnKF by

(3.4)

where is given by

In the linear case we can write as

(3.5)

with from the minimization problem (2.9).

Remark 3.1.

Similarly to Remark 2.3 the RHS of (3.4) is discontinuous and we will consider the smoothed system

for .

3.2. Variance inflation

The EnKF and the ESRF are known to have certain difficulties in a high dimensional setting. One of the cases is where the dimension of the parameter space is considerably larger than the dimension of the ensemble. As a result this can undervalue the importance of the covariance matrices. One common way to alleviate this issue is through variance inflation [2, 3, 22].

The idea is to inflate the empirical covariance by another covariance matrix which is of the prior. This can usually take two forms, additive inflation or multiplicative inflation, both defined below as

(3.6)
(3.7)

where is our prior covariance and is a constant. If we relate this to EKI, this was first applied in [31]. To keep our work consistent with previous work we stick to the case of additive inflation (3.6). By incorporating this form of inflation our gradient flow structure is modified to

(3.8)

We note if one was to apply it for the projection the limit would differ in that the would be projection under .

3.3. Convergence analysis in the linear case

In the linear case we view the continuous time limit of projected EnKF as preconditioned gradient flow.

3.3.1. Transformed method for the EnKF

In Example 2.9 we have seen that in the case of preconditioned projected gradient methods, it is not possible to ensure a descent direction for an arbitrary choice of a positive definite matrix . Based on the results of Bertsekas [6], we will transform (3.4) in a way such that the preconditioner is diagonal with respect to an index set which is built similar to the set from (2.15). Since we consider a system of particles we will use a preconditioner which is diagonal with respect to an index set which depends on the whole ensemble of particles , in particular we set

Similarly, we could also choose the index set to be

In this work we will focus on the choice of . Therefore, we consider the preconditioned gradient flow given by

(3.9)

where and the preconditioner is given by

(3.10)

Let be a system of particles in . Without loss of generality we assume for . Hence, the the preconditioner can be written as

where .

Remark 3.2.

Consider the continuous time limit of the projected EnKF in equation (3.4). When we introduce variance inflation for (3.5) with the help of a diagonal matrix, e.g. by

(3.11)

then the preconditioner is diagonal with respect to the index set and can be written in the form of (3.10).

This can be seen as follows: since evolves by (3.9) we have that for all and . For simplicity, we assume . Let , then it follows

Since the particles are feasible for all , we obtain

and finally for all . This leads to

for , and .

Hence, we can view the projected EnKF with variance inflation as special case of the presented method.

3.3.2. Transformed method for the ESRF

The continuous limit (3.4) and the gradient flow (3.5) can be adapted to the case of ESRF. The same can be said for the projected preconditioned limit above. In order to do so the closed form would need to be changed such that the limit follows a similar form to (2.4).

The resulting transformed method for the ESRF is given by

(3.12)

where now and the preconditioner is again given by (3.10).

For simplicity we will focus in the following on the transformed method for EKI (3.9). All results can be easily adapted to the transformed method (3.12) for the ESRF.

3.3.3. Convergence Results

Let be KKT-point for (2.9) and be the solution of the system (3.9). Our aim is to prove the convergence of the residual for To show convergence we have to control the empirical covariance over time. In particular, in the following proposition we will prove that we can bound the ensemble spread over time.

Proposition 3.3.

Let be the initial ensemble and denotes the solution for the approximated flow of (3.9) given by

(3.13)

where has been defined in Remark 2.3. Then, it holds true that

for all .

Proof.

We consider and show that , which implies the monotonicity of the quantity . We have