1 Introduction
The Bayesian perspective on inverse problems has attracted much mathematical attention in recent years (Kaipio and Somersalo, 2005; Stuart, 2010). Particular attention has been paid to Bayesian inverse problems (BIPs) in which the parameter to be inferred lies in an infinitedimensional space , a typical example being a scalar or tensor field coupled to some observed data via an ODE or PDE. Numerical solution of such infinitedimensional BIPs must necessarily be performed in an approximate manner on a finitedimensional subspace , but it is profitable to delay discretisation to the last possible moment and consider the original infinitedimensional problem as the primary object of study, since infinitedimensional wellposedness results and algorithms descend to in a discretisationindependent way, whereas careless early discretisation may lead to a sequence of wellposed finitedimensional BIPs or algorithms whose stability properties degenerate as the discretisation dimension increases.
The infinitedimensional (or, in statistical language, nonparametric) context presents a number of computational and theoretical challenges. The development of sampling algorithms that are well defined in infinitedimensional spaces, and hence robust in a high finite dimension, is an interesting topic in itself; the preconditioned Crank–Nicolson (pCN) method of Cotter et al. (2013) is a representative example in this area. On a more fundamental level, though, the specification of the Bayesian prior and likelihood must be treated with care in order to ensure that the posterior measure is well posed in the sense of Hadamard, i.e. is well defined and depends ‘nicely’ upon the observed data etc.
It has become clear over the last two decades that it is not sufficient to study only the finitedimensional formulation of BIPs, and some analysis of the infinitedimensional limit is essential. In the Finnish school of inverse problems, this notion of infinitedimensional wellposedness became known as discretisation invariance. Its importance was highlighted by a seminal paper on edgepreserving image reconstruction using the discrete total variation prior (Lassas and Siltanen, 2004). The model problem in this paper was the reconstruction using pixels of a onedimensional continuum image from
dimensional linear observations subject to additive Gaussian noise. Corresponding to a prior belief that the true image is piecewise smooth with a few jump discontinuities, the discrete total variation prior on
is used. However, Lassas and Siltanen (2004)showed that there is no way to consistently scale the discrete TV prior so that the Bayesian prior and posterior (and derived summary quantities such as conditional means and maximum a posteriori estimators) all have meaningful limits as
. This negative result inspired a search for priors that would provide edgepreserving and discretisationinvariant Bayesian image reconstruction.It has recently been suggested, based upon numerical experiments, that Cauchy difference priors may be suitable priors for edgepreserving Bayesian image reconstruction (Markkanen et al., 2016). However, in this case, the wellposedness theory of Stuart (2010) must be extended to the heavytailed setting. In this respect, this work forms part of an ongoing investigation of BIPs with heavytailed or other complicated structure far from the simple Gaussian regime (Dashti et al., 2012; Hosseini, 2017; Hosseini and Nigam, 2017; Lassas et al., 2009).
2 Bayesian inverse problems
This section reviews the essential notions for the study of BIPs in function spaces, following the style of Stuart (2010). We fix two spaces and ; the space contains an unknown that we wish to recover from data in . We posit that is in fact a noisily perturbed version of , where is a known function, referred to as a forward operator. The prototypical example to keep in mind is the case of additive Gaussian noise:
(2.1) 
where is a centred valued Gaussian random variable with covariance operator . Naturally, many other noise models can arise in practice. The challenge of recovering from is known as an inverse problem.
Inverse problems are typically ill posed in the sense of having no solution in the strict sense, or multiple solutions, or solutions that depend very sensitively upon the problem setup, and in particular upon the data . To deal with the existence problem, one typically relaxes the notion of solution and seeks a minimiser of a suitable misfit functional . In the additive Gaussian case, the appropriate functional is a weighted leastsquares functional:
(2.2) 
Such problems can still be illconditioned, and in order to rectify this and also incorporate prior beliefs about (e.g. that should be small in some norm) one typically minimises a regularised version of the misfit:
(2.3) 
For example, on , the regularisation functional
could be the Euclidean norm, corresponding to ridge regression; or the
norm, corresponding to the sparsitypromoting LASSO; or the TV norm alluded to in the introduction.The Bayesian interpretation of the inverse problem is to interpret both and as random variables. One first posits a prior measure for — in the case of PDEconstrained inverse problems, typically encodes beliefs about the smoothness of in that it charges a suitable Sobolev or Hölder space with full mass. Eq. (2.1) then determines the conditional distribution of , the likelihood; the Bayesian solution to the inverse problem is the conditional distribution of , the posterior distribution. If and
were finite sets, then this posterior would be given in terms of the prior and likelihood by Bayes’ rule; informally, the posterior is the likelihood times the prior, normalised to be a probability measure. It is also helpful to think of the prior
as having probability density proportional to , and the posterior as having density proportional to .However, particularly when is an infinitedimensional space, Bayes’ rule must be stated more carefully, since there is no uniform reference measure (counting measure or Lebesgue measure) with respect to which one can take densities. This generalisation of Bayes’ rule is the content of Theorem 4.1 below. It is also of interest to study the stability of the Bayesian posterior under approximation/error in the observed data, the likelihood, or the prior; this is the content of Theorem 4.2 and the discussion thereafter.
3 Stable distributions in quasiBanach spaces
Cauchy distributions have recently been proposed as discretisationinvariant edgepreserving priors by Markkanen et al. (2016). Cauchy distributions are an example of more general stable distributions, which are themselves generalised by infinitely divisible distributions (Hosseini, 2017). This section surveys how stable distributions can be defined and sampled by a Karhunen–Loèvelike expansion that is valid not just on Banach but even more general spaces. While separability and completeness remain desirable attributes, it turns our that the triangle inequality can be easily relaxed.
A quasiBanach space
is a vector space
equipped with a quasinorm , i.e. a function that is positivedefinite and homogeneous and satisfies, for some choice of constant , the weakened triangle inequality(3.1) 
and that is Cauchycomplete with respect to this quasinorm. Examples of such spaces include and for .
A valued random variable is said to be stable of order if, when are independent and identically distributed copies of , is equal in distribution to for some . Equivalently, in terms of the law of and the rescaling ,
(3.2) 
Stability is a particularly appealing property if the aim is to construct prior measures for BIPs that are ‘physically consistent’ in the sense of remaining in the same model class regardless of discretisation or coordinate choices, at least when the ‘physical quantity’ obeys an additive law, e.g. the amount of mass or charge contained within a given physical region.
The stable distributions on
are completely classified by four parameters: the index of stability
mentioned above, a skewness parameter
, a scale parameter , and a location parameter . We denote the unique such distribution by . The caseis the case of a Gaussian (normal) distribution; the case
, is the Cauchy distribution. Crucially, when , only has finite moments up to, but not including, order (the exception being , in which case has moments of all orders).To define and sample valued stable random variables, we take a Karhunen–Loèvestyle approach, i.e. we resort to series expansions with real stable random coefficients in a countable, unconditional, normalised, Schauder basis of , e.g. a polynomial, Fourier, or wavelet basis. We also assume that the basis and are such that the synthesis operator is a continuous embedding of the sequence space of coefficients into , i.e.
(3.3) 
When is a Banach space, this assumption holds with for any choice of basis , since it is just the triangle inequality for an unconditionally convergent series in . Since , whenever (3.3) holds for it also holds with replaced by any . If inequality (3.3) can be reversed, possibly with a different constant, then the basis is known as a frame for . The case is the wellknown notion of a Riesz basis.
The convergence theorem for such series expansions — which states that sufficiently rapid decay of the scale parameters implies almost sure convergence of the series in — is broadly the same as the wellknown Gaussian case , but with some logarithmic corrections (Sullivan, 2017):
Theorem 3.1.
Consider , where are independent, with , , , and, in addition,
(3.4) 
with the convention that . Then almost surely. Furthermore, for with , in as and, in particular,
(3.5) 
In particular, in the case , we can define valued Cauchy random variables with moments up to but not including order by using scalar Cauchydistributed random coefficients whose scale parameters are ‘slightly better than ’ in the sense that is finite.
4 Wellposedness of BIPs with stable priors
The wellposedness of the posterior with respect to perturbations of the problem setup is a topic of natural interest. Since there are many choices of (inequivalent) topology or metric upon the space of probability measures on , this wellposedness can be quantified in many ways (Deza and Deza, 2016, Chapter 14). We choose to use the Hellinger metric defined by
(4.1) 
where is an arbitrary choice of reference measure with respect to which both and are absolutely continuous. The Hellinger topology is stronger than the weak topology, equivalent to the total variation topology, and weaker than the relative entropy (Kullback–Leibler) topology. It appears quite naturally in the wellposedness analysis of BIPs, and has the advantage of controlling the expected values of squareintegrable quantities of interest:
(4.2) 
The basic result ensuring that the posterior distribution for exists and satisfies the appropriate generalisation of Bayes’ rule is as follows (Stuart, 2010; Sullivan, 2017):
Theorem 4.1.
Let and be quasiBanach spaces and let be a prior distribution on . Suppose that is locally bounded, Carathéodory (i.e. continuous in and measurable in ). Suppose also that, for every , there exists so that
(4.3) 
and such that . Then, whenever , it follows that the normalising constant is strictly positive and finite and
(4.4) 
does indeed define a Borel probability measure on , which is Radon if is Radon, and is the posterior distribution of conditioned upon the data .
The possibility of a lower bound on , as in Eq. (4.3), that tends to as or seems counterintuitive from a finitedimensional perspective. However, such do arise naturally in the infinitedimensional setting, and indeed are unavoidable: for example, in the case of a Gaussian prior and likelihood, the naïve formulation of as a quadratic misfit would be almost surely infinite, so the Cameron–Martin theorem must be used to ‘subtract off the infinite part of ’, making it almost surely finite, but at the cost of a global lower bound of (Stuart, 2010, Remark 3.8).
In the case of a Cauchy prior, Theorems 3.1 and 4.1 combine to say, informally, that a BIP with Cauchy prior has welldefined posterior provided that does not escape to any faster than logarithmically. This is to be contrasted with the polynomial growth rate that is permitted under a Gaussian prior.
The BIP can also be shown to be wellposed in the sense of depending continuously in the Hellinger metric upon perturbations of the data and the potential (Stuart, 2010; Sullivan, 2017). Indeed, with a possibly different constant, the BIP inherits the local Lipschitz continuity of :
Theorem 4.2.
Suppose that, in addition to the conditions of Theorem 4.1, for every , there exists such that, for all with ,
(4.5) 
Suppose also that . Then the posterior depends locally Lipschitz continuously in the Hellinger metric upon the observed data in the sense that, for all , there exists a constant such that
(4.6) 
Similar results to Theorem 4.2 can be shown for the wellposedness of with respect to perturbation of , e.g. as a result of incorporating an approximate numerical solution of the forward operator , which might be an ODE or PDE solution operator. Again, similarly to (4.5), the key property is for the approximation error in to be controlled by a bound that is exponentially integrable with respect to the prior ; the error rate for the forward problem, e.g. as a function of mesh size, then transfers to the BIP with a possibly different constant prefactor.
On the other hand, the wellposedness of BIPs with respect to the prior is a subtle topic, especially if the model is misspecified (i.e. there is no parameter value in that corresponds to the ‘truth’). Of particular note in this setting is the brittleness phenomenon highlighted by Owhadi et al. (2015a, b): not only does depend upon , as would be expected, but it can do so in a highly discontinuous way, in the sense that any prespecified quantity of interest can be made to have any desired posterior expectation value after arbitrarily small perturbation in the commonmoments, weak, total variation, or Hellinger topologies. This brittleness phenomenon takes place in the limit as the data resolution becomes infinitely fine, and it is a topic of current research whether approaches such as coarsening can in general yield robust inferences (Miller and Dunson, 2015).
Acknowledgements
The author is supported by the Free University of Berlin within the Excellence Initiative of the German Research FOundation (DFG).
References
 Cotter et al. [2013] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci., 28(3):424–446, 2013. doi:10.1214/13STS421.
 Dashti et al. [2012] M. Dashti, S. Harris, and A. Stuart. Besov priors for Bayesian inverse problems. Inverse Probl. Imaging, 6(2):183–200, 2012. doi:10.3934/ipi.2012.6.183.
 Deza and Deza [2016] M. M. Deza and E. Deza. Encyclopedia of Distances. Springer, Berlin, fourth edition, 2016. doi:10.1007/9783662528440.
 Hosseini [2017] B. Hosseini. Wellposed Bayesian inverse problems with infinitely divisible and heavytailed prior measures. SIAM/ASA J. Uncertain. Quantif., 5(1):1024–1060, 2017. doi:10.1137/16M1096372.
 Hosseini and Nigam [2017] B. Hosseini and N. Nigam. Wellposed Bayesian inverse problems: priors with exponential tails. SIAM/ASA J. Uncertain. Quantif., 5(1):436–465, 2017. doi:10.1137/16M1076824.
 Kaipio and Somersalo [2005] J. Kaipio and E. Somersalo. Statistical and Computational Inverse Problems, volume 160 of Applied Mathematical Sciences. SpringerVerlag, New York, 2005.
 Lassas and Siltanen [2004] M. Lassas and S. Siltanen. Can one use total variation prior for edgepreserving Bayesian inversion? Inverse Problems, 20(5):1537–1563, 2004. doi:10.1088/02665611/20/5/013.
 Lassas et al. [2009] M. Lassas, E. Saksman, and S. Siltanen. Discretizationinvariant Bayesian inversion and Besov space priors. Inverse Probl. Imaging, 3(1):87–122, 2009. doi:10.3934/ipi.2009.3.87.
 Markkanen et al. [2016] M. Markkanen, L. Roininen, J. M. J. Huttunen, and S. Lasanen. Cauchy difference priors for edgepreserving Bayesian inversion with an application to Xray tomography, 2016. arXiv:1603.06135.
 Miller and Dunson [2015] J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening, 2015. arXiv:1506.06101.

Owhadi et al. [2015a]
H. Owhadi, C. Scovel, and T. Sullivan.
On the brittleness of Bayesian inference.
SIAM Rev., 57(4):566–582, 2015a. doi:10.1137/130938633.  Owhadi et al. [2015b] H. Owhadi, C. Scovel, and T. Sullivan. Brittleness of Bayesian inference under finite information in a continuous world. Electron. J. Stat., 9(1):1–79, 2015b. doi:10.1214/15EJS989.
 Stuart [2010] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer., 19:451–559, 2010. doi:10.1017/S0962492910000061.
 Sullivan [2017] T. J. Sullivan. Wellposed Bayesian inverse problems and heavytailed stable quasiBanach space priors. Inverse Probl. Imaging, 11(5):857–874, 2017. doi:10.3934/ipi.2017040.