Inverse problems arise naturally in a variety of scientific disciplines, where the relationship between the quantity of interest and the data collected in an experiment is determined by the physics of the underlying system and can be mathematically modelled. Real world measurements are always discrete and carry statistical noise, which is often most naturally modelled by independent Gaussian random variables. The observation scheme then gives rise to an inverse regression model of the form
where the forward operator is assumed to be linear between separable Hilbert spaces .
However, formulation and analysis of the inverse problem is usually best done by restrictions from an underlying continuous model. This guarantees, among other things, discretisation invariance, allowing to switch consistently between different discretisations [11, 29, 30, 48]. Thus in this paper we consider the continuous (nonparametric) linear inverse problem of recovering an unknown function from a noisy indirect measurement
Model (2) is asymptotically equivalent to (1) when and is assumed to be Gaussian white noise in the separable Hilbert space [4, 42]. Note that while can be defined by its actions on , it does not take values there almost surely.
In the present paper we follow the Bayesian approach to inverse problems, employing a standard nonparametric procedure based on a centred Gaussian prior for , see[11, 48]. The solution to the statistical inverse problem is then the conditional distribution of given
, the mean or mode of which can be used as a point estimator. The main appeal of the method is, however, that it automatically delivers quantification of the uncertainty in the reconstruction, obtained through credible sets, i.e. regions of the parameter space with specified high posterior probabilities. In many applications this methodology can be efficiently implemented using modern MCMC algorithms that allow fast sampling from the posterior distribution.
In the Bayesian approach the prior distribution serves as a regularisation tool, and it is natural to ask whether the methodology delivers correct, prior-independent - and if so, in some sense optimal - inference on the unknown parameter in the small noise limit. These questions can be addressed under the frequentist assumption that is in reality generated through the scheme (2) by a fixed true solution (instead of being randomly drawn from ). We then investigate how the posterior distribution concentrates around when . The speed of convergence can be characterised through posterior contraction rates, first studied in [17, 46], and further investigated by [1, 2, 10, 24, 26, 28, 27, 25, 41, 55] among the others. See also [37, 39, 40] for results relative to non-linear inverse problems.
However, determining whether the associated uncertainty quantification is objectively valid requires finer analysis of the posterior distribution. The central question is: Do credible sets have the correct frequentist coverage in the small noise limit? That is, do we have, for some ,
with a small as ? The importance of the above questions is not restricted just to the Bayesian paradigm. In linear Bayesian inverse problems with Gaussian priors the conditional mean estimator coincides with the maximum a posteriori (MAP) estimator, which in turn can be shown to coincide with a Tikhonov regulariser with a Cameron-Martin space norm penalty, see [10, 20]. Thus, if (3) holds for a credible set centred at the posterior mean, we can use as an (asymptotic) frequentist confidence region for the Tikhonov regulariser .
Obtaining optimal contraction rates is not enough to answer the above question even in the parametric case. For finite-dimensional models the Bernstein–von Mises (BvM) theorem establishes, under mild assumptions, that the posterior distribution is approximated by a Gaussian distribution which is centred at the maximum likelihood estimator forand has minimal covariance matrix. As a consequence, credible sets coincide asymptotically with frequentist confidence regions (see, e.g., [53, Chapter 10]). On the other hand, understanding the frequentist properties of nonparametric credible sets presents a more delicate matter. It was observed by , and later in , that the BvM phenomenon may fail to hold even in a simple nonparametric regression model, where credible balls in are shown to have null asymptotic coverage.
Positive results, both in the direct and inverse setting, have been obtained in subsequent developments [6, 7, 26, 31, 49]. In particular, [6, 7] showed that a natural way of investigating the nonparametric BvM phenomenon is from a semiparametric perspective, by showing the weak convergence of the posterior to a fixed infinite-dimensional Gaussian distribution on a large enough function space. Recently, this program has been successfully adjusted for inverse problems: a semiparametric result was obtained in  for geodesic X-ray transforms, while a nonparametric BvM theorem was proved in  for the non-linear inverse problem of recovering the potential term in an elliptic partial differential equation (PDE); see also  for non-linear inverse problems with jump processes.
In this paper we follow ideas presented in  by extending the results to linear inverse problems of the general form (2). In particular, we prove BvM theorems for functionals , with a large family of test functions , which entails the convergence of to a limiting Gaussian process with optimal covariance structure that recovers the semiparametric information lower bound. As a consequence, we deduce the statistical efficiency of plug-in Tikhonov regularisers
and that credible intervals centred at such estimators constitute asymptotically valid and optimal confidence intervals. The applicability of the general theory is illustrated by deriving sufficient conditions on the test functions for the BvM phenomenon to occur in case of recovering an unknown source function in elliptic boundary value problems (BVP), and in a BVP for the heat equation. We then show for the elliptic BVP example, in which the properties of the crucial ’inverse Fisher information’ operatorare well-understood (by PDE theory), that the techniques employed previously can be refined to further relax the assumptions on the test functions to depend only on the smoothing properties of . Finally, by requiring a slightly stronger smoothness, we adapt the program laid out in  to the problem at hand, and obtain a nonparametric BvM theorem which implies that certain nonparametric credible sets built around the Tikhonov regulariser have asymptotically correct coverage and optimal diameter. Note that we do not make additional assumptions about the smoothness of . Instead of assuming a source condition to achieve convergence in a desired space, we study the convergence in a larger space which is defined by the smoothness of .
This article is organised as follows: we introduce the general setting in Section 2.1, and state and prove the semiparametric BvM theorem for linear functionals of the unknown in Section 2.2. In Section 3 we deduce from the previous results the asymptotic normality of and the coverage properties of credible intervals. Section 4 is dedicated to the examples. Finally, in Section 5 we formulate the nonparametric BvM theorem for the problem of recovering the source function in an elliptic BVP. Appendix A provides some background on the theory of semiparametric statistical efficiency.
2 General posterior results
2.1 The Bayesian approach for linear inverse problems
We are interested in the following continuous (nonparametric) model for indirect measurements
The forward operator is assumed to be linear, bounded and injective, and are separable Hilbert spaces of real valued functions defined on , which are -, -dimensional Riemannian manifolds respectively. Below we often denote , . The forward operator has a well defined adjoint for which
for all and .
Suppose there exists a separable Hilbert space , such that is continuous and that is dense in the norm of . In particular, for some ,
Note that the more smoothing the forward operator is, the larger we can choose . For example, if we assume that is a -times smoothing elliptic differential operator we may choose .
The noise amplitude is modelled by . The measurement noise is a centred Gaussian white noise process with covariance
Below we often write for the random variable . Observing data means that we observe a realisation of the Gaussian process with marginal distributions ; and we denote by the probability space supporting .
Let be the law of for fixed . Arguing as in [37, Section 7.4], we can use the law of as a common dominating measure, and then apply the Cameron–Martin theorem (e.g. Proposition 6.1.5 in ) to define the likelihood function
We assume that follows a prior measure which is a Gaussian Borel probability measure on and denote its reproducing kernel Hilbert space (RKHS) or Cameron-Martin space by . Noticing (again, as argued in [37, Section 7.4]) that given by (6
) can be taken to be jointly measurable, we can then use Bayes’ theorem to deduce that the posterior distribution ofarising from observation (4) can be written as
We are interested in analysing under the assumption that the measurement is generated from a true deterministic unknown , that is, when in (4) . In the following we assume that the prior satisfies the following concentration condition for a given .
Let be a Gaussian Borel probability measure on a separable Hilbert space and let be its RKHS. Define the concentration function of for a fixed as
We assume that for a fixed and some sequence , such that , as , satisfies
2.2 A semiparametric Bernstein–von Mises theorem
Next we formulate a semiparametric Bernstein–von Mises theorem for linear inverse problems. Theorem 3 below states the convergence of random laws in probability, which means that for any metric for weak convergence of laws, the real random variables converge to zero in probability, see .
Let be the law generating , where fulfils Condition 1, is white noise in , and is the noise level. Let be the posterior distribution arising from observations (4) and prior with satisfying Condition 2 for a given .
Let be such that , for all , and suppose , for some . Then we have in -probability
as , where
Note that, since is dense and is assumed to be uniformly continuous, we can extend continuously to .
The proof follows ideas developed in  for the special case of being the X-ray transform. We start by showing that it is enough to consider convergence of instead of with some large enough set . Here denotes the posterior arising from the prior restricted to and renormalised. The second step is to find an appropriate set
. We then proceed to study the moment generating function ofunder the posterior and finally conclude that converges weakly to .
Let be a Gaussian Borel probability measure in . Suppose that for a fixed its concentration function satisfies Condition 2 with some sequence , such that . Let be the posterior distribution arising from the measurement (4). Then for any Borel set for which
and all small enough we have
when in -probability. Above is the posterior arising from the prior restricted to and renormalised.
We start by noting that one can write and furthermore
which implies . Hence it suffices to prove the first limit.
We can write
Under , we have for any (see [37, Lemma 3])
Let be any probability measure on the set . Applying Jensen’s inequality to the exponential function we get for any
Denote where, using again Jensen’s inequality, we can estimate
We can then conclude
where the last inequality follows from standard Gaussian tail bound .
Next we choose and let
Using the above with we see that . Using Markov’s inequality, denoting the expectation with respect to , it suffices to to prove that
tends to zero, when . Since we see that
The second term can be written as
where is the RKHS of . Following the approach of [19, Proposition 2.6.19] we next show that
Let be such that . Then . We denote . Using the Cameron-Martin theorem [3, Corollary 2.4.3.] and the fact that is a centred Gaussian random variable we can write
where and . The last inequality follows from the fact for all . We can then conclude
Let be such that
for all and with some . Note that, since is dense and is uniformly continuous, we can extend continuously to . When we have and the standard Gaussian tail bound guarantees for all that
Hence we can choose
and restrict to studying the posterior distribution arising from the prior .