    # Iterate Averaging and Filtering Algorithms for Linear Inverse Problems

It has been proposed that classical filtering methods, like the Kalman filter and 3DVAR, can be used to solve linear statistical inverse problems. In the work of Igelsias, Lin, Lu, Stuart (2017), error estimates were obtained for this approach. By optimally tuning a free parameter in the filters, the authors were able to show that the mean squared error can be minimized. In the present work, we prove that by (i) considering the problem in a weaker, weighted, space and (ii) applying simple iterate averaging of the filter output, 3DVAR will converge in mean square, unconditionally on the parameter. Without iterate averaging, 3DVAR cannot converge by running additional iterations with a given, fixed, choice of parameter. We also establish that the Kalman filter's performance cannot be improved through iterate averaging. We illustrate our results with numerical experiments that suggest our convergence rates are sharp.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

The focus of this work is on the inverse problem

 (1.1) y=Au†+η,

where given the noisy observation of , we wish to infer . In our setting, is a compact operator between Hilbert spaces and

is white noise, modelling measurement error. This problem is well known to be ill-posed in the infinite dimensional setting, as

has an unbounded inverse.

In the work of , the authors considered two classical filtering algorithms, the Kalman filter and 3DVAR, with the goal of using them to solve (1.1). As discssued in , the filtering methodology for (1.1) requires the introduction, conceptually, of the artificial dynamical system

 (1.2a) un =un−1, (1.2b) yn =Aun+ηn.

Here, at algorithmic step , is the quantity of interest, and is the noisy observation. Having ascribed a notion of time to the problem, we can then apply a filter. This provides a mechanism for estimating in (1.1) in an online setting, where a sequence of i.i.d. observations, , is available.Ṫhis corresponds to “Data Model 1” of .

Amongst the key results of , reviewed in detail below, is that under sufficiently strong assumptions, the Kalman filter will recover the truth in mean square, unconditionally on the choice of parameters in the filter. Under somewhat weaker assumptions, the error will only be bounded, though through minimax selection of the parameter, an optimal error can be achieved for a given number of iterations.

3DVAR is a simplification of Kalman that has is demonstrated to have, at best, bounded error, though, again, through minimax parameter tuning, it performs comparably to Kalman. Kalman is more expensive than 3DVAR, as it requires updating an entire covariance operator at each iteration. For finite dimensional approximations, this may require costly matrix-matrix multiplications.

Here, by working in a weaker, weighted, norm and averaging the iterates, we are able to establish that 3DVAR will unconditionally converge in mean square for all admissible filter parameters. Further, we show that this simple iterate averaging cannot improve the performance of the Kalman filter.

### 1.1. Filtering Algorithms

The Kalman filter is a probabilistic filter that estimates a Gaussian distribution,

, for at each iterate. Given a starting mean and covariance, and , the updates are as follows:

 (1.3a) mn =Knyn+(I−KnA)mn−1, (1.3b) Cn =(I−KnA)Cn−1, (1.3c) Kn =Cn−1A∗(ACn−1A∗+γ2I)−1.

Here, is the so-called “Kalman gain.” is a point estimate of .

While Kalman is a probabilistic filter, 3DVAR is not. It is obtained by applying Kalman with a fixed covariance operator. for some fixed operator :

 (1.4a) un =Kyn+(I−KA)un−1, (1.4b) K =(A∗A+αΣ−1)−1A∗.

Note that 3DVAR corresponds to an infinite dimensional AR(1) process. Our aim is to build on the framework and methodology of .

### 1.2. Key Assumptions and Prior Results

In , the following assumptions were made to obtain results. We retain these assumptions for our results.

###### Assumption 1.
1. with , , and a self-adjoint positive definite trace class operator with densely defined.

2. induces a Hilbert scale and there exists constants , such that induces an equivalent norm:

 (1.5) C−1∥x∥ν≤∥Ax∥≤C∥x∥ν,∥∙∥ν=∥Σν2∙∥.
3. The initial error is sufficiently smooth,

 (1.6) m0−u†∈Dom(Σ−s2),0≤s≤a+2,

where we replace with in the case of 3DVAR in the above expression.

Under this first set of assumptions, Iglesias et al. established

###### Theorem 1.1 (Theorem 4.1 of ).

The Kalman filter admits the mean square error bound

 E[∥mn−u†∥2]≲(nα)−sa+1+γ2αTrΣ

and

###### Theorem 1.2 (Theorem 5.1 of ).

3DVAR admits the mean square error bound

 E[∥un−u†∥2]≲(nα)−sa+1+γ2αTrΣlogn.

At fixed values of , Theorems 1.1 and 1.2 preclude convergence, and, in the case of 3DVAR, the error may even grow. However, there are two free parameters: the number of iterations and the regularization parameter . Indeed, within a Bayesian framework, can be interpreted as the strength of a prior relative to a likelihood. By tuning these parameters one can either:

• Select so as to minimize the error at a given ;

• Select so as the minimize the error for a given .

This is accomplished in the usual way by minmizing the upper bounds on the error over and . It suggests that the error can be made arbitrarily small. However, in both expressions, there is an unknown constant. If the error at the given, optimal choice of and is inadequate, one must choose a different value of and rerun the algorithm with this new choice. A benefit of the present work is that, by using iterate averaging, the error of 3DVAR can always be reduced by computing additional iterates, without adjusting and discarding previously computed iterations.

Somewhat stronger results were obtained in  under a simultaneous diagonalization assumption.

###### Assumption 2.
1. and

simultaneously diagonalize with respective eigenvalues

and , and these eigenvalues satisfy

 (1.7) σi=i−1−2ϵ,ai≍i−p,ϵ>0,p>0.
2. (or in 3DVAR) and satisfies, for ,

 (1.8) ∞∑k=1k2β|u†k|2<∞.

Under the diagonalization assumption, one has

###### Theorem 1.3 (Theorem 4.2 of ).

Under Assumption 2, for the Kalman filter,

 (1.9) E[∥mn−u†∥2]≲(nα)−2β1+2ϵ+2p+γ2n−2ϵ1+2ϵ+2pα−1+2p1+2ϵ+2p

and

###### Theorem 1.4 (Theorem 5.2 of ).

Under Assumption 2, for 3DVAR,

 (1.10) E[∥un−u†∥2]≲(nα)−2β1+2ϵ+2p+Cγ2α−1+2p1+2ϵ+2p.

Now, the Kalman filter will converge at any choice of parameter, while 3DVAR has at worst a bounded error. Again, can be tuned so as to obtain the minimax convergence rate.

### 1.3. Main Results

The main results of this paper are contained in the following theorems.

First, we have the elementary result that 3DVAR, without averaging, cannot converge at fixed parameter choices:

###### Theorem 1.5.

Under Assumption 1 in dimension one, if generated by 3DVAR, then

 (1.11) E[|un−u†|2]≥γ2K2.

As the method cannot converge in dimension one, it has no hope of converging in higher dimensions.

By time averaging,

 (1.12) ¯un=1nn∑k=1uk,=1nun+n−1n¯un−1,

under some additional assumptions, we can obtain convergence independently of the choice of :

###### Theorem 1.6.

Under Assumption 1, fix and , and, having set these indices, assume that is trace class. Then

 E[∥¯un−u†∥2t]≲(nα)−s+t1+ν∥z0∥2+γ2αTr(Σt+1−τv(1+ν))(nα)−τv

where is the solution to

 (1.13) Σ−12(u0−u†)=(B∗B)s−12(1+ν)z0.

Consequently, we will have unconditional mean squared convergence convergence of the iterate averaged value, , provided:

• We study the problem in a sufficiently weak weighted space () and/or have sufficiently smooth data ();

• has a sufficiently well behaved spectrum allowing . Note that taking will not require additional assumptions on , but will require for convergence.

We emphasize that iterate averaging is a post-processing step, requiring no modification of the underlying 3DVAR iteration.

Under a modified version of Assumption 2,

###### Assumption 2′.
1. and simultaneously diagonalize with respective eigenvalues and , and these eigenvalues satisfy

 (1.14) σi≍i−1−2ϵ,ai≍i−p,ϵ>0,p>0.
2. Fixing a -norm in which to study convergence, assume the data satisfies

 (1.15) ∞∑k=1k2β|u0,k−u†k|2<∞.
###### Theorem 1.7.

Under Assumption 2, and having fixed , assume satisfy

 (1.16a) τb ≤t(1+2ϵ)+2β2(1+2ϵ+2p)≡¯τb (1.16b) τv

then

 E[∥¯un−u†∥2t]≲(nα)−2τb+γ2α(nα)−τv

In contrast to iterate averaged 3DVAR, there is no gain to iterate averaging for Kalman:

###### Theorem 1.8.

For the scalar Kalman filter, take

. Then the bias and variance of the iterate-averaged mean,

satisfy the inequalities

 |E[¯mn]−u†| ≥|E[mn]−u†|, Var(¯mn) ≥Var(mn).

### 1.4. Outline

The structure of this paper is as follows. In Section 2 we review certain background results needed for our main results. Section 3 examines the scalar case, and it includes proofs of Theorems 1.5 and 1.8. We prove Theorems 1.6 and 1.7 in Section 4. Numerical examples are given in Section 5. We conclude with a brief discussion in Section 6.

Acknowledgements: The authors thank A.M. Stuart for suggesting an investigation of this problem. This work was supported by US National Science Foundation Grant DMS-1818716. The content of this work originally appeared in  as a part of F.G. Jones’s PhD dissertation. Work reported here was run on hardware supported by Drexel’s University Research Computing Facility.

## 2. Preliminary Results

In this section, we establish some identities and estimates that will be crucial to proving our main results.

Much of our analysis relies on spectral calculus involving the following rational functions:

 (2.1) rn,α(λ) =(αα+λ)n, (2.2) qn,α(λ) =1λ{1−(αα+λ)n}=λ−1(1−rn,α(λ)).

Throughout, and . These are related by the identity

 (2.3) m∑k=1rk,α(λ)=αqm,α(λ).

The following estimates are established in  and , particularly Section 2.2 of the latter reference:

###### Lemma 2.1.

For and ,

 (2.4) 0n.
###### Lemma 2.2.

For , ,

 (2.6) λpqn,α(λ)≤{(nα)1−p,p∈[0,1],Λp−1,p>1, (2.7) λpqn,α(λ)≤λp−1.

Next, we recall the following result on Hilbert scales,

###### Proposition 2.3.

With , there exists a constant , such that for ,

 D−1∥x∥θ(1+ν)≤∥(B∗B)θ2x∥≤D∥x∥θ(1+ν)

and .

This result, based on a duality argument, is proven in Lemma 4.1 of . See, also, Section 8.4 of , particularly Corollary 8.22.

We also have a few useful identities for the filters which we state without proof.

###### Lemma 2.4.

For the Kalman filter, the mean and covariance operators and the Kalman gains satisfy the identities

 mn =(γ2n−1C−10+A∗A)−1(A∗¯yn+γ2n−1C−10m0) C−1n =C−1n−1+γ−2A∗A=C−10+γ−2nA∗A Kn =(γ2C−1n−1+A∗A)−1A∗=(γ2C−10+nA∗A)−1A∗=γ−2CnA∗.
###### Lemma 2.5.

For 3DVAR,

 ¯un=n−1∑k=0n−kn(I−KA)kK¯yn−k+n−1∑k=01n(I−KA)k(I−KA)u0.
###### Corollary 2.6.

Letting , ,

 ¯vn=n−1∑k=0n−kn(I−KA)kK¯ηn−k+n−1∑k=01n(I−KA)k(I−KA)v0.
###### Remark 2.7.

As this is a linear problem, it will be sufficient to study the behavior of to infer convergence of to .

For the analysis of 3DVAR, the essential decomposition is into a bias and a variance term. From Corollary 2.6, these are

 (2.8) ¯Ibiasn =n−1∑k=01n(I−KA)k(I−KA)v0, (2.9) ¯Ivarn =n−1∑k=0n−kn(I−KA)kK¯ηn−k.

The bias and variance can be expressed in the more useful forms using :

###### Lemma 2.8.
 (2.10) ¯Ibiasn =αnΣ12qn,α(B∗B)Σ12v0, (2.11) ¯Ivarn =1nn∑j=1Σ12qn−j+1,α(B∗B)B∗ηj.
###### Proof.

First, observe that

 I−KA=Σ12α(αI+B∗B)Σ−12.

Using this in (2.8) together with spectral calculus applied to positive self-adjoint compact operator , along with (2.3),

 ¯Ibiasn=1nn−1∑k=0Σ1/2αk(αI+B∗B)−k+1Σ−1/20v0=1nn∑k=1Σ1/2rk,α(B∗B)Σ−1/2v0=αnΣ12qn,α(B∗B)Σ−12v0.

Applying the same computations to (2.9), we have,

 ¯Ivarn=n−1∑k=0n−knα−1Σ120rk+1,α(B∗B)B∗¯ηn−k=1nn∑j=1{n−j∑k=0α−1Σ12rk+1,α(B∗B)B∗}ηj=1nn∑j=1Σ12qn−j+1,α(B∗B)B∗ηj.

## 3. Analysis of the Scalar Problem

Before proceeding to the general, infinite-dimensional case, it is instructive to consider the scalar problem, where and , , and are now scalars.

This setting will also allow us to establish the limitations of both 3DVAR and the Kalman filter alluded to in the introduction. The scalar problem also serves as a building block in the case that it is possible to simultaneously diagonalize operators and in the general case.

### 3.1. 3dvar

Operator is now just the scalar constant, the regularization remains , and the 3DVAR gain defined in (1.4) is now the scalar.

First, we have prove Theorem 1.5, which asserts that the 3DVAR iteration cannot converge in mean square:

###### Proof.

Since , we write for . By (1.4),

 un−u†=Kηn+KAu†+(1−KA)un−1−u†=Kηn+(1−KA)(un−1−u†).

Consequently,

 E[|un−u†|2]=E[|κηn|2]+E[|(1−KA)(un−1−u†)|2]≥E[|Kηn|2]=K2γ2.

Next, studying the bias and variance of the time averaged problem, given by (2.8) and (2.9), we prove

###### Theorem 3.1.

For scalar time averaged 3DVAR, for

 E[|¯un−u†|2]≤(A2Σ)−2τb|v0|2(nα)−2τb+Σγ2α(A2Σ)−τv(nα)−τv.

Thus, we have unconditional convergence for any choice for , something that we do not have for 3DVAR without any iterate averaging. Indeed, Theorem 1.5 tells us that for any fixed set of parameters, we would always have a finite error, regardless of . The rate of convergence is greatest when and .

To obtain the result, we make use of the bias variance decomposition and expressions (2.10) and (2.11). In the scalar case, , so that

 (3.1) ∣∣¯Ibiasn∣∣2=(αn)2qn,α(ΣA2)2|v0|2.

Applying (2.7) to this expression, we immediately obtain

###### Proposition 3.2.

For ,

 (3.2) ∣∣¯Ibiasn∣∣2≤(A2Σ)−2τb|v0|2(nα)−2τb.

For the variance, we have the result

###### Proposition 3.3.

Let ,

 (3.3) E[|¯Ivarn|2]≤Σγ2α(A2Σ)−τv(nα)−τv.
###### Proof.

For the scalar case of (2.11),

 E[|¯Ivarn|2]=γ2(AΣ)2n2n∑j=1qj,α(ΣA2)2.

Then, using Lemma 2.2,

 E[|¯Ivarn|2]=γ2(AΣ)2n2n∑j=1qj,α(A2Σ)2=γ2Σn2(A2Σ)1−(1+τv)n∑j=1[(A2Σ)1+τv2qj,α(A2Σ)]2≤Σγ2n2(A2Σ)−τvn∑j=1(jα)2(1−1+τv2)≤Σγ2(A2Σ)−τvn2n(nα)1−τv=Σγ2α(A2Σ)−τv(nα)−τv.

###### Proof of Theorem 3.1.

The result then follows immediately by combining the two preceding propositions.

### 3.2. Kalman Filter

Next, we provide a proof of Theorem 1.8, showing there is no improvement in mean squared convergence of Kalman under iterate averaging.

###### Proof.

Using Lemma 2.4, for the -the estimate of the mean,

 mk=(αΣk+a2)−1(A¯yk+αΣkm0)=(αΣk+A2)−1(A2u†+A¯ηk+αΣkm0)=(1+αA2Σk)−1u†+(1+A2Σkα)−1m0+(A+αAΣk)−1¯ηk.

and without averaging,

 E[mn]−u† =(1+A2Σnα)−1(m0−u†), Var(mn) =(A+αAΣn)−2γ2n.

Then, with averaging, for the bias,

 E[¯mn]−u†=1nn∑k=1(1+A2Σkα)−1(m0−u†),

and

 |E[¯mn]−u†|2=∣∣ ∣∣1nn∑k=1(1+A2Σkα)−1∣∣ ∣∣2|m0−u†|2≥∣∣ ∣∣1nn∑k=1(1+A2Σnα)−1∣∣ ∣∣2|m0−u†|2=|E[mn]−u†|2.

For the variance, first note

 ¯mn−E[¯mn]=1nn∑k=1(A+αAΣk)−1¯ηk=1nn∑k=1(A+αAΣk)−1{k∑j=1ηj}=1nn∑j=1ηj⎧⎨⎩n∑k=j(A+αAΣk)−1⎫⎬⎭.

Then, by dropping all but the -th term in the inner sum,

## 4. Analysis of the Infinite Dimensional Problem

We return to the bias and variance of 3DVAR in the general, potentially infinite dimensional, setting and obtain estimates on the terms.

### 4.1. General Case

Here, we prove Theorem 1.6.

###### Proposition 4.1.

Under Assumption 1, with ,

 (4.1) ∥¯Ibiasn∥2t≲(nα)−s+t1+ν∥z0∥2

where solves (1.13).

###### Remark 4.2.

The fastest possible decay available for the squared bias in Proposition 4.1 is when and .

###### Proof.

We make use of bias term from Lemma 2.8, allowing us to write

 ∥¯Ibiasn∥2t=∥∥∥αnΣt+12qn,α(B∗B)Σ−12v0∥∥∥2.

Next, we make use of (1.5) and argue as in the Appendix of , applying Proposition 2.3. Since, by assumption, , . By Proposition 2.3, letting , allows us to conclude the existence of . Therefore,

 ∥¯Ibiasn∥2t=∥∥∥αnΣt+12qn,α(B∗B)(B∗B)s−12(1+ν)z0∥∥∥2.

Next, using Proposition 2.3 again, now with ,

 ∥¯Ibiasn∥2t≲∥∥∥αn(B∗B)t+12(1+ν)qn,α(B∗B)(B∗B)s−12(1+ν)z0∥∥∥2=∥∥∥αn(B∗B)s+t2(1+ν)qn,α(B∗B)z0∥∥∥2≤(sup0≤λ≤∥B∗B∥∣∣∣αnλs+t2(1+ν)qn,α(λ)∣∣∣)2∥z0∥2≤(nα)−s+t1+ν∥z0∥2.

The last inequality holds since, and , so that allowing for the application of Lemma 2.2. ∎

###### Proposition 4.3.

Under Assumption 1, for , , and for this choice of and , assume is trace class. Then

 E[∥¯Ivarn∥2t]≲γ2αTr(Σt+1−τv(1+ν))(nα)−τv.
###### Remark 4.4.

The fastest possible decay in the variance will be when and is sufficiently large such that is trace class. However, the bias term requires . This requires the identity operator to be trace class which will not hold in infinite dimensions.

###### Proof.

We begin with equation (2.11) and using that for any bounded operator and positive self adjoint trace class operator , ,

 E[∥¯Ivarn∥2t]=1n2n∑j=1E[∥Σt+12qn−j+1,α(B∗B)B∗ηj∥2]=γ2n2n∑j=1Tr(Σt+12qj,α(B∗B)(B∗B)qj,α(B∗B)Σt+12)=γ2n2n∑j=1Tr(Σt+1−τv(1+ν)(Στv1+ν2(B∗B)12qj,α(B∗B)(B∗B))2)≤γ2n2n∑j=1∥Στv1+ν2(B∗B)12qj,α(B∗B)(B∗B)∥2Tr(Σt+1−τv(1+ν)).

Using Proposition 2.3 with and Lemma 2.2,

 ∥Στv1+ν2(B∗B)12qj,α(B∗B)(B∗B)∥≲∥(B∗B)1+τv2qj,α(B∗B)∥≲supλ∈[0,∥B∗B∥]λ1+τv2qj,α(λ)≲(jα)1−1+τv2

Therefore,

 E[∥¯Ivarn∥2t]≲γ2n2Tr(Σt+1−τv(1+ν))n∑j=1(jα)1−τv≲γ2αTr(Σt+1−τv(1+ν))(nα)−τv

###### Proof of Theorem 1.6.

The theorem immediately follows from the two preceding propositions. ∎

### 4.2. Simultaneous Diagonalization

A sharper result is available under the simultaneous diagonalization Assumption 2 . Indeed, let us assume that and simultaneously diagonalize against the orthonormal set , with eigenvalues

 (4.2) Σφk=σkφk,A∗Aφk=a2kφk.

The assumptions of (1.14) and (1.15) are equivalent to those of (1.5) and (1.6) under the identifications:

 (4.3) ν(1+2ϵ)=2p,s(1+2ϵ)=2β.

Also, observe that, letting

 (4.4) ω=1+2ϵ1+2ϵ+2p,

we have the relationship

 (4.5) σk≍(σka2k)ω
###### Proposition 4.5.

Under Assumption 2, let satisfy condition (1.16a),

 ∥∥¯