The focus of this work is on the inverse problem
where given the noisy observation of , we wish to infer . In our setting, is a compact operator between Hilbert spaces and
is white noise, modelling measurement error. This problem is well known to be ill-posed in the infinite dimensional setting, ashas an unbounded inverse.
In the work of , the authors considered two classical filtering algorithms, the Kalman filter and 3DVAR, with the goal of using them to solve (1.1). As discssued in , the filtering methodology for (1.1) requires the introduction, conceptually, of the artificial dynamical system
Here, at algorithmic step , is the quantity of interest, and is the noisy observation. Having ascribed a notion of time to the problem, we can then apply a filter. This provides a mechanism for estimating in (1.1) in an online setting, where a sequence of i.i.d. observations, , is available.Ṫhis corresponds to “Data Model 1” of .
Amongst the key results of , reviewed in detail below, is that under sufficiently strong assumptions, the Kalman filter will recover the truth in mean square, unconditionally on the choice of parameters in the filter. Under somewhat weaker assumptions, the error will only be bounded, though through minimax selection of the parameter, an optimal error can be achieved for a given number of iterations.
3DVAR is a simplification of Kalman that has is demonstrated to have, at best, bounded error, though, again, through minimax parameter tuning, it performs comparably to Kalman. Kalman is more expensive than 3DVAR, as it requires updating an entire covariance operator at each iteration. For finite dimensional approximations, this may require costly matrix-matrix multiplications.
Here, by working in a weaker, weighted, norm and averaging the iterates, we are able to establish that 3DVAR will unconditionally converge in mean square for all admissible filter parameters. Further, we show that this simple iterate averaging cannot improve the performance of the Kalman filter.
1.1. Filtering Algorithms
The Kalman filter is a probabilistic filter that estimates a Gaussian distribution,, for at each iterate. Given a starting mean and covariance, and , the updates are as follows:
Here, is the so-called “Kalman gain.” is a point estimate of .
While Kalman is a probabilistic filter, 3DVAR is not. It is obtained by applying Kalman with a fixed covariance operator. for some fixed operator :
Note that 3DVAR corresponds to an infinite dimensional AR(1) process. Our aim is to build on the framework and methodology of .
1.2. Key Assumptions and Prior Results
In , the following assumptions were made to obtain results. We retain these assumptions for our results.
with , , and a self-adjoint positive definite trace class operator with densely defined.
induces a Hilbert scale and there exists constants , such that induces an equivalent norm:
The initial error is sufficiently smooth,
where we replace with in the case of 3DVAR in the above expression.
Under this first set of assumptions, Iglesias et al. established
Theorem 1.1 (Theorem 4.1 of ).
The Kalman filter admits the mean square error bound
Theorem 1.2 (Theorem 5.1 of ).
3DVAR admits the mean square error bound
At fixed values of , Theorems 1.1 and 1.2 preclude convergence, and, in the case of 3DVAR, the error may even grow. However, there are two free parameters: the number of iterations and the regularization parameter . Indeed, within a Bayesian framework, can be interpreted as the strength of a prior relative to a likelihood. By tuning these parameters one can either:
Select so as to minimize the error at a given ;
Select so as the minimize the error for a given .
This is accomplished in the usual way by minmizing the upper bounds on the error over and . It suggests that the error can be made arbitrarily small. However, in both expressions, there is an unknown constant. If the error at the given, optimal choice of and is inadequate, one must choose a different value of and rerun the algorithm with this new choice. A benefit of the present work is that, by using iterate averaging, the error of 3DVAR can always be reduced by computing additional iterates, without adjusting and discarding previously computed iterations.
Somewhat stronger results were obtained in  under a simultaneous diagonalization assumption.
simultaneously diagonalize with respective eigenvaluesand , and these eigenvalues satisfy
(or in 3DVAR) and satisfies, for ,
Under the diagonalization assumption, one has
Now, the Kalman filter will converge at any choice of parameter, while 3DVAR has at worst a bounded error. Again, can be tuned so as to obtain the minimax convergence rate.
1.3. Main Results
The main results of this paper are contained in the following theorems.
First, we have the elementary result that 3DVAR, without averaging, cannot converge at fixed parameter choices:
Under Assumption 1 in dimension one, if generated by 3DVAR, then
As the method cannot converge in dimension one, it has no hope of converging in higher dimensions.
By time averaging,
under some additional assumptions, we can obtain convergence independently of the choice of :
Under Assumption 1, fix and , and, having set these indices, assume that is trace class. Then
where is the solution to
Consequently, we will have unconditional mean squared convergence convergence of the iterate averaged value, , provided:
We study the problem in a sufficiently weak weighted space () and/or have sufficiently smooth data ();
has a sufficiently well behaved spectrum allowing . Note that taking will not require additional assumptions on , but will require for convergence.
We emphasize that iterate averaging is a post-processing step, requiring no modification of the underlying 3DVAR iteration.
Under a modified version of Assumption 2,
and simultaneously diagonalize with respective eigenvalues and , and these eigenvalues satisfy
Fixing a -norm in which to study convergence, assume the data satisfies
Under Assumption 2, and having fixed , assume satisfy
In contrast to iterate averaged 3DVAR, there is no gain to iterate averaging for Kalman:
For the scalar Kalman filter, take . Then the bias and variance of the iterate-averaged mean,
. Then the bias and variance of the iterate-averaged mean,satisfy the inequalities
The structure of this paper is as follows. In Section 2 we review certain background results needed for our main results. Section 3 examines the scalar case, and it includes proofs of Theorems 1.5 and 1.8. We prove Theorems 1.6 and 1.7 in Section 4. Numerical examples are given in Section 5. We conclude with a brief discussion in Section 6.
Acknowledgements: The authors thank A.M. Stuart for suggesting an investigation of this problem. This work was supported by US National Science Foundation Grant DMS-1818716. The content of this work originally appeared in  as a part of F.G. Jones’s PhD dissertation. Work reported here was run on hardware supported by Drexel’s University Research Computing Facility.
2. Preliminary Results
In this section, we establish some identities and estimates that will be crucial to proving our main results.
Much of our analysis relies on spectral calculus involving the following rational functions:
Throughout, and . These are related by the identity
For and ,
For , ,
Next, we recall the following result on Hilbert scales,
With , there exists a constant , such that for ,
We also have a few useful identities for the filters which we state without proof.
For the Kalman filter, the mean and covariance operators and the Kalman gains satisfy the identities
Letting , ,
As this is a linear problem, it will be sufficient to study the behavior of to infer convergence of to .
For the analysis of 3DVAR, the essential decomposition is into a bias and a variance term. From Corollary 2.6, these are
The bias and variance can be expressed in the more useful forms using :
3. Analysis of the Scalar Problem
Before proceeding to the general, infinite-dimensional case, it is instructive to consider the scalar problem, where and , , and are now scalars.
This setting will also allow us to establish the limitations of both 3DVAR and the Kalman filter alluded to in the introduction. The scalar problem also serves as a building block in the case that it is possible to simultaneously diagonalize operators and in the general case.
Operator is now just the scalar constant, the regularization remains , and the 3DVAR gain defined in (1.4) is now the scalar.
First, we have prove Theorem 1.5, which asserts that the 3DVAR iteration cannot converge in mean square:
Since , we write for . By (1.4),
For scalar time averaged 3DVAR, for
Thus, we have unconditional convergence for any choice for , something that we do not have for 3DVAR without any iterate averaging. Indeed, Theorem 1.5 tells us that for any fixed set of parameters, we would always have a finite error, regardless of . The rate of convergence is greatest when and .
Applying (2.7) to this expression, we immediately obtain
For the variance, we have the result
Proof of Theorem 3.1.
The result then follows immediately by combining the two preceding propositions.
3.2. Kalman Filter
Next, we provide a proof of Theorem 1.8, showing there is no improvement in mean squared convergence of Kalman under iterate averaging.
Using Lemma 2.4, for the -the estimate of the mean,
and without averaging,
Then, with averaging, for the bias,
For the variance, first note
Then, by dropping all but the -th term in the inner sum,
4. Analysis of the Infinite Dimensional Problem
We return to the bias and variance of 3DVAR in the general, potentially infinite dimensional, setting and obtain estimates on the terms.
4.1. General Case
Here, we prove Theorem 1.6.
The fastest possible decay available for the squared bias in Proposition 4.1 is when and .
We make use of bias term from Lemma 2.8, allowing us to write
Next, using Proposition 2.3 again, now with ,
The last inequality holds since, and , so that allowing for the application of Lemma 2.2. ∎
Under Assumption 1, for , , and for this choice of and , assume is trace class. Then
The fastest possible decay in the variance will be when and is sufficiently large such that is trace class. However, the bias term requires . This requires the identity operator to be trace class which will not hold in infinite dimensions.
We begin with equation (2.11) and using that for any bounded operator and positive self adjoint trace class operator , ,
Proof of Theorem 1.6.
The theorem immediately follows from the two preceding propositions. ∎
4.2. Simultaneous Diagonalization
A sharper result is available under the simultaneous diagonalization Assumption 2 . Indeed, let us assume that and simultaneously diagonalize against the orthonormal set , with eigenvalues
Also, observe that, letting
we have the relationship