1. Introduction
Inverse problems aim to recover a quantity of interest from perturbed noisy measurements. Traditionally inverse problems were solved by approximating in a leastsquares manner, where the solution would correspond to the minimizer of a function [14]
. Another more recent approach for solving inverse problems is to take a statistical approach where the quantity of interest is a probabilistic distribution constructed via Bayes’ Theorem, known as the Bayesian approach, see e.g.
[19, 35]. The advantage of quantifying inverse problems through the latter is that it aids in quantifying uncertainty through statistical properties. Common algorithms designed for Bayesian inversion include sampling based methods such as Markov chain Monte Carlo (MCMC) methods but also variational Bayes’ methods. One recent method aimed at solving Bayesian inverse problems through data assimilation methodologies
[24, 29] is ensemble Kalman inversion (EKI).EKI can be viewed as the application of the ensemble Kalman filter (EnKF)
[16, 15, 21, 25] towards inverse problems. It was first proposed by Iglesias et al. [18, 26] as a motivation for derivativefree inverse problems, offering a cheaper approximation of the solution compared to traditional methods. EKI provides a unique perspective as despite aimed at Bayesian inversion, its methodology is not based on sampling like MCMC, but instead a variational manner. Since its formulation a number of research directions have been considered such as applications, building theory and applying uncertainty quantification techniques [8, 9, 11, 17, 31]. However, the basic version of the EKI does not allow to incorporate additional constraints on the parameters, which often arise in many applications due to additional knowledge on the system. We will focus in the following on the efficient incorporation of box constraints. An example of this type of constrains includes the case of hierarchical EKI [11]where the hyperparameters are often defined through a uniform distribution. It is well known that the EKI may lead to estimates of the unknown parameters, which are unfeasible, i.e. the estimates do not satisfy the box constraints imposed by the uniform distribution on the hierarchical parameters. As a result there is a strong motivation to study the incorporation of constraints for the EKI.
Introducing constrained optimization for the EnKF has been a challenge, which has drawn increasingly more attention in recent years. A literature overview of existing methods for the treatment of linear and nonlinear constraints for Kalmanbased methods can be found in [1, 34]. The projection of the estimates to the feasible set is a very common approach, see e.g. [20, 37], which can be generalized to nonlinear constraints by linearization ideas. Most of the variants are motivated by interpreting the Kalmanbased updates as a solution of a corresponding optimization problem, see [1] for more details. This viewpoint allows to straightforwardly include constraints on the parameters and states. In [13], the authors suggest a new approach to handle linear equality and inequality constraints for the EnKF and EKI by reparameterizing the solution of the optimization problem in the range of the covariance, i.e. by seeking the solution of the optimization problem in a subspace defined by the initial ensemble. In this work, we will adopt the optimization viewpoint, i.e. we will view the EKI as a derivativefree optimizer for the leastsquares misfit functional (cp. [31]), and motivate the incorporation of additional constraints on the unknown parameters via projectionbased methods to the feasible set. To illustrate the idea, we focus in the following on the incorporation of boxconstraints, i.e. we will introduce a variant of the EKI ensuring that the estimate of the unknown parameter remains within a box of the parameter domain. Our work will build on the theory by Bertsekas [6, 7] and others [32, 33] for projected preconditioned gradient methods. For inverse problems, boxconstrained optimization has been applied but in a different context and for different applications, we refer the reader to [28].
We will show that a simple projection of the EKI estimate to the feasible set will not necessarily lead to a convergent scheme. To accommodate this we propose using techniques from data assimilation such as variance inflation [2, 3, 22]
to ensure that one can attain the correct descent direction. From this our analysis will consists of an existence and uniqueness result, a quantification of the ensemble collapse and the convergence of the residuals. We aim to compare this modified projected method to the simple projection of its continuostime limit. The limit leads to an ordinary differential equation with discontinuous right hand side. The presented theory is based on a smoothened version of the limit, borrowing ideas from barrier methods.
In addition to the EKI based on perturbed observations, we will also consider the ensemble square root filter (ESRF) applied to inverse problems. The ESRF [23, 27, 36] is a modification of the EnKF, but with the key difference of being deterministic as there is no inclusion of the perturbed observations. We emphasize our work will be primarily focused on the EnKF for inverse problems. We will make note of this throughout the paper whether various results can be generalized to include the ESRF. In order to justify the proposed projected EKI, we provide numerical examples demonstrating the improvements, both for the original projected EKI but also the modified projected EKI with variance inflation.
The main contributions of our work can be summarized as follows:

We propose a new variant of the EKI and the ESRF for inverse problems which allows to incorporate box constraints on the unknown parameters. The modification of the original EKI is motivated by viewing the algorithm as a preconditioned gradientlike flow. The scheme introduces a tailored variance inflation to ensure descent directions with respect to the misfit functional in each iteration.

We derive a complete convergence analysis for the EKI with perturbed observations and ESRF for inverse problems based on the continuous time limit of the suggested variant in the case of a linear forward problem.

The validity of the results obtained are highlighted through numerical examples. All examples will be based on partial differential equations (PDEs) with linear and nonlinear forward operators.
1.1. Outline
The article is structured as follows: in Section 2 we provide an introduction to boxconstrained optimization. We present this in a general sense before discussing its application to EKI, which we also review. Section 3 is dedicated to the derivation of the projected continuous time limit and gradient flow structure. Convergence and accuracy results will be presented in the linear setting, where we also introduce the notion of variance inflation. Section 4 is devoted to numerical experiments on PDEconstrained inverse problems. The purpose of this section is to verify the theoretical findings on both linear and nonlinear problems. Finally in Section 5 we conclude with a number of remarks while providing avenues of future work.
2. Background on EKI and boxconstrained optimization
The goal of computation for this work is to infer the unknown parameters from noisy data of the form
(2.1) 
where is our parametertoobservation mapping and denotes the number of observations. For simplicity, we will assume that the parameter space is finite dimensional, i.e. . Most of the results presented in the following sections can be straightforwardly generalized to the infinitedimensional setting under suitable assumptions on the forward operator . To avoid the technicalities arising from the analysis of infinitedimensional differential equations and optimization problems, we will work under the assumption that the parameter space is finite dimensional (possibly after applying a finitedimensional approximation of the unknown quantity ).
2.1. Continuous time limit of the EKI
We briefly describe the EKI algorithm below and refer to [16, 18, 31] for more details on the derivation of the method. EKI works by updating an initial ensemble of particles through a twostep procedure. The EKI estimation of (2.1) is given by the usual EnKF update formula, also known as the analysis step
(2.2) 
where we have the inclusion of perturbed observations
(2.3) 
The update formula depends on empirical means and covariances defined as
where
denotes the tensor product for Hilbert spaces, given by
where & denote the Hilbert spaces and their inner products, such that and . The convergence analysis will be based on the analysis of the continuous time limit for projected EKI in the linear case. This was derived for the original EKI by Schillings et al. [31], where the limit is given by
(2.4) 
with and initial condition
(2.5) 
Here denotes the weighted Euclidean inner product in . Note that the limit is derived by neglecting the perturbations of the observations in each iteration, see [31] for more details. The analysis of the stochastic differential equation corresponding to the limit of (2.2) including the perturbed observations will be subject to future work.
In the case of a linear forward problem, the limit can be equivalently viewed as a preconditioned gradient flow
with denoting the weighted Euclidean norm in , misfit functional and empirical covariance .
2.2. Continuous time limit of the ESRF for inverse problems
Using a deterministic transformation of the ensemble satisfying the Kalman equations leads to the ESRF for inverse problems, see [30]. Based on the results from Reich et al. [4, 5], the continuous time limit of the ESRF for inverse problems is given by
(2.6) 
with and initial condition
and its gradient flow structure
(2.7) 
in the linear setting.
Both variants of the ensemble Kalman inversion satisfy the wellknown subspace property, which states that the estimate will be a linear combination of the inital ensemble members.
Lemma 2.1.
The preconditioned gradient flow structure viewpoint opens up the perspective to include additional constraints such as box constraints or more general, nonlinear constraints. The remaining section is devoted to the introduction of the boxconstrained optimization. We will focus in the following on the linear case, i.e., assuming that with to illustrate the basic ideas. Then, the optimization problem consists of a linear leastsquares problem, i.e. . We will modify the differential equations (2.4) and (2.7), respectively, to include the constraints using the ideas introduced below. We will see that we can interpret the suggested modifications as a variance inflation technique. In particular, this viewpoint implies that the subspace property will not hold anymore for the modified versions of the EKI.
2.3. The linear boxconstrained optimization problem
Motivated by the optimization perspective on the EKI, we consider the following optimization problem: The objective function consists of the leastsquares function with , where . We further define the set of the constraints by
(2.8) 
for , , . We set , i.e. the feasible set is given by . The constrained optimization problem is then given by
(2.9) 
Note that the optimization problem (2.9) is convex, which implies that necessary optimality conditions are also sufficient [7]. Let be a Karush Kuhn Tucker (KKT)point for (2.9), i.e. there exists such that


for all ,

for all ,

,
then is a global minimizer of (2.9). Box constraints are a special instance of the feasible set , i.e. , where is the
th unit vector, and
are the lower or upper bounds on .Remark 2.2.
In case is symmetric and positive definite, the optimization problem (2.9) is strictly convex. In particular, there exists at most one stationary point which is the global minimum for the problem (2.9) (cp. [7], Proposition 2.1.1). However, please note that the assumption on the regularity of is in general not satisfied for inverse problems due to the illconditioning of the problem.
2.3.1. The projected gradient method for boxconstraints
We will shortly discuss the projected gradient method to numerically solve (2.9) and refer the reader e.g. to the work of Bertsekas [7] for a more detailed description of the method (in the discrete time setting).
Let denote a box. We can define a general projection as mapping the space of the ensemble to specific box. Since we will fix the box for the following part, we will write instead of . We define the projection componentwise as
The projected gradient method with step size is based on the iteration
(2.10) 
One can derive a continuous time by considering going to zero. By using directional derivatives we obtain
(2.11) 
More details can be found for the continuous time limit of the projected EKI in Section 3.1.
Remark 2.3.
Since the right hand side (RHS) of (2.11) is discontinuous, it is not obvious that a solution to this system exists. To ensure unique existence we consider a smoothed version of (2.11) by approximating the limit by ideas inspired from barrier methods. We introduce the parametrized, convex optimization problems
(2.12) 
with parameter and inequality constraints and . As , the log barrier functions become closer to the indicator function of the feasible set of the original problem. We equivalently consider the problems
(2.13) 
in the following, where we define . We will approximate (2.11) for by
Theorem 2.4.
Let and denote the solution of the smoothed initial value problem of (2.11) with .
Proof.
We define and prove that is a strict Lyapunovfunction by the strict convexity of the optimization problem. The flow of satisfies
thus, the claim follows. ∎
Remark 2.5.
Remark 2.6.
Formally, we obtain in the limit the same result stated in Theorem 2.4 for being a solution of the initial value problem (2.11) with : We define with
and the index sets
Since is a feasible point, we get for and for and we obtain
where we have used the strict convexity of and optimality of and we conclude formally with .
Corollary 2.7.
Let and denote the solution of the smoothed initial value problem of (2.11) with .
Proof.
2.3.2. The preconditioned projected gradient method
We consider the preconditioned version of the iteration (2.10) in discrete time given by
(2.14) 
where is a symmetric, positive definite matrix. It is well known that arbitrary choice of gives no descent for any choice of . We will briefly discuss an example demonstrating that the preconditioning of the gradient flow does not lead to a descent direction in general (cp. [6]). This is highlighted in Figure 1.
Example 2.9.
We consider the 2dimensional quadratic example: We define the quadratic function
and consider the minimization problem of with the constraints . The gradient of is given by
The current iterate is given by . As a preconditioner, we consider the symmetric, positive definite symmetric matrix
For , the next iteration is given by
where is the projection onto . Then,
i.e. for all the function value of the objective function increases.
Example 2.9 shows that a simple projection strategy for the EKI, which is a preconditioned gradient flow in the linear case, does not lead to a convergent descent method in general.
To ensure a descent direction for the preconditioned projected gradient method, we follow the approach of [6, Proposition 1], which suggests to use matrices diagonal with respect to the subset of indices containing
(2.15) 
We will give more details on the modification of the preconditioner in the context of the EKI in the following. In particular, we will make use of data assimilation methods such as variance inflation to ensure a descent. This will be discussed in greater detail in Section 3.
3. Convergence analysis
In this section we introduce a variant of the projected EKI and derive the continuous limit of the algorithm. We will provide a complete convergence analysis of the proposed modification in the linear setting. This will include an accuracy result of the proposed algorithm, the analysis of the ensemble collapse and the convergence to the truth.
Recall that the componentwise projection onto the box is denoted by . To incorporate the projection into our scheme we modify the procedure described in Section 2. We now define our prediction step in its projected form as
and similarly for the covariances
Then by using these estimates we can construct our update formula in its closed form which is
(3.1) 
which is an almost identical copy of the update formula for the nonprojected EnKF.
Now in order to test the projection we aim to derive a continuous time limit.
3.1. Continuous time limit
We now consider deriving a continuous time limit for the projected EnKF for inverse problems. The form of the limit will differ to that described in Section 1 , as we wish to quantify the limit in terms of directional derivatives. We begin be recalling the equations in closed form, given by the componentwise increments
By using the Neumann expansion for part of the Kalman gain, we observe for positivesemidefinite that
(3.2) 
By defining , using (3.2) and by using the definition of directional derivatives, we obtain
(3.3) 
Finally, we obtain the continuous time limit for the EnKF by
(3.4) 
where is given by
In the linear case we can write as
(3.5) 
with from the minimization problem (2.9).
3.2. Variance inflation
The EnKF and the ESRF are known to have certain difficulties in a high dimensional setting. One of the cases is where the dimension of the parameter space is considerably larger than the dimension of the ensemble. As a result this can undervalue the importance of the covariance matrices. One common way to alleviate this issue is through variance inflation [2, 3, 22].
The idea is to inflate the empirical covariance by another covariance matrix which is of the prior. This can usually take two forms, additive inflation or multiplicative inflation, both defined below as
(3.6)  
(3.7) 
where is our prior covariance and is a constant. If we relate this to EKI, this was first applied in [31]. To keep our work consistent with previous work we stick to the case of additive inflation (3.6). By incorporating this form of inflation our gradient flow structure is modified to
(3.8) 
We note if one was to apply it for the projection the limit would differ in that the would be projection under .
3.3. Convergence analysis in the linear case
In the linear case we view the continuous time limit of projected EnKF as preconditioned gradient flow.
3.3.1. Transformed method for the EnKF
In Example 2.9 we have seen that in the case of preconditioned projected gradient methods, it is not possible to ensure a descent direction for an arbitrary choice of a positive definite matrix . Based on the results of Bertsekas [6], we will transform (3.4) in a way such that the preconditioner is diagonal with respect to an index set which is built similar to the set from (2.15). Since we consider a system of particles we will use a preconditioner which is diagonal with respect to an index set which depends on the whole ensemble of particles , in particular we set
Similarly, we could also choose the index set to be
In this work we will focus on the choice of . Therefore, we consider the preconditioned gradient flow given by
(3.9) 
where and the preconditioner is given by
(3.10) 
Let be a system of particles in . Without loss of generality we assume for . Hence, the the preconditioner can be written as
where .
Remark 3.2.
Consider the continuous time limit of the projected EnKF in equation (3.4). When we introduce variance inflation for (3.5) with the help of a diagonal matrix, e.g. by
(3.11) 
then the preconditioner is diagonal with respect to the index set and can be written in the form of (3.10).
This can be seen as follows: since evolves by (3.9) we have that for all and . For simplicity, we assume . Let , then it follows
Since the particles are feasible for all , we obtain
and finally for all . This leads to
for , and .
Hence, we can view the projected EnKF with variance inflation as special case of the presented method.
3.3.2. Transformed method for the ESRF
The continuous limit (3.4) and the gradient flow (3.5) can be adapted to the case of ESRF. The same can be said for the projected preconditioned limit above. In order to do so the closed form would need to be changed such that the limit follows a similar form to (2.4).
The resulting transformed method for the ESRF is given by
(3.12) 
where now and the preconditioner is again given by (3.10).
3.3.3. Convergence Results
Let be KKTpoint for (2.9) and be the solution of the system (3.9). Our aim is to prove the convergence of the residual for To show convergence we have to control the empirical covariance over time. In particular, in the following proposition we will prove that we can bound the ensemble spread over time.
Proposition 3.3.
Proof.
We consider and show that , which implies the monotonicity of the quantity . We have
Comments
There are no comments yet.